NFDI4DS | UHH-SEMS - Publication Details

Shiyu Zhou

ORCID: 0000-0002-6889-0316

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101400153

Research Areas

Speech Recognition and Synthesis
Music and Audio Processing
Natural Language Processing Techniques
Speech and Audio Processing
Topic Modeling
Ferroelectric and Piezoelectric Materials
Microwave Dielectric Ceramics Synthesis
Advanced Sensor and Energy Harvesting Materials
Advanced Image and Video Retrieval Techniques
Retinal Imaging and Analysis
Multimodal Machine Learning Applications
Machine Learning in Healthcare
Multiferroics and related materials
Acoustic Wave Resonator Technologies
Thermal Expansion and Ionic Conductivity
Dielectric properties of ceramics
Advanced Fiber Optic Sensors
Surface Modification and Superhydrophobicity
Electronic and Structural Properties of Oxides
Complex Network Analysis Techniques
Spectroscopy and Chemometric Analyses
Glaucoma and retinal disorders
Water Quality Monitoring and Analysis
Advanced Text Analysis Techniques
Web Data Mining and Analysis

Shenzhen Institutes of Advanced Technology
2021-2024

Chinese Academy of Sciences
2013-2024

Shandong Institute of Automation
2017-2024

Dalian Polytechnic University
2024

Shanghai University
2024

Shaanxi University of Science and Technology
2021-2022

Institute of Automation
2018-2021

University of Chinese Academy of Sciences
2017-2018

High energy density, temperature stable lead-free ceramics by introducing high entropy perovskite oxide

OPENALEX - Publications

Shiyu Zhou Yongping Pu Xuqing Zhang Yu Shi Ziyan Gao and 4 more

10.1016/j.cej.2021.131684 article EN Chemical Engineering Journal 2021-08-10

Dielectric temperature stability and energy storage performance of NBT‐based ceramics by introducing high‐entropy oxide

OPENALEX - Publications

Shiyu Zhou Yongping Pu Xinyi Zhao Tao Ouyang Jiamin Ji and 6 more

Abstract In this study, a high‐entropy perovskite oxide Sr(Zr 0.2 Sn Hf Ti Nb )O 3 (SZSHTN) was first introduced to Na 0.5 Bi TiO (NBT) lead‐free ferroelectric ceramics boost both the high‐temperature dielectric stability and energy storage performance. Excellent comprehensive performance simultaneously obtained in 0.8NBT–0.2SZSHTN ceramic with high ε ′ value (> 2000), wide ′‐temperature stable range (TCC < 5%, 52.4–362°C), low tan δ (<0.01, 90–341°C) ( W rec = 3.52 J/cm , η varies...

10.1111/jace.18455 article EN Journal of the American Ceramic Society 2022-04-01

Exploring wav2vec 2.0 on Speaker Verification and Language Identification

OPENALEX - Publications

Zhiyun Fan Meng Li Shiyu Zhou Bo Xu

Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning.It follows two-stage training process of pre-training and fine-tuning, performs well in recognition tasks especially ultra-low resource cases.In this work, we attempt to extend the speaker verification language identification.First, use some preliminary experiments indicate that wav2vec can capture information about language.Then demonstrate effectiveness on two respectively.For verification,...

10.21437/interspeech.2021-1280 article EN Interspeech 2022 2021-08-27

Ultra‐Weak Polarization‐Strain Coupling Effect Boosts Capacitive Energy Storage

OPENALEX - Publications

Leiyang Zhang Ruiyi Jing Yunyao Huang Yule Yang Yang Li and 11 more

Abstract In pulse power systems, multilayer ceramic capacitors (MLCCs) encounter significant challenges due to the heightened loading electric field ( E ), which can lead fatigue damage and ultrasonic concussion caused by electrostrictive strain. To address these issues, an innovative strategy focused on achieving ultra‐weak polarization‐strain coupling effect is proposed, effectively reduces strain in MLCCs. Remarkably, ultra‐low coefficient Q 33 ) of 0.012 m 4 C −2 achieved composition...

10.1002/adma.202406219 article EN Advanced Materials 2024-08-12

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

OPENALEX - Publications

Shiyu Zhou Linhao Dong Shuang Xu Bo Xu

Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network.In these models, the Transformer, new sequence-to-sequence relying entirely self-attention without using RNNs or convolutions, achieves single-model state-of-the-art BLEU machine translation (NMT) tasks.Since outstanding performance of we extend it to concentrate as basic...

10.21437/interspeech.2018-1107 article EN Interspeech 2022 2018-08-28

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

OPENALEX - Publications

Shiyu Zhou Shuang Xu Bo Xu

Sequence-to-sequence attention-based models integrate an acoustic, pronunciation and language model into a single neural network, which make them very suitable for multilingual automatic speech recognition (ASR). In this paper, we are concerned with on low-resource languages by Transformer, one of sequence-to-sequence models. Sub-words employed as the modeling unit without using any lexicon. First, show that ASR Transformer performs well despite some confusion. We then look at incorporating...

10.48550/arxiv.1806.05059 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

OPENALEX - Publications

Cheng Yi Wang Jian-zhong Ning Cheng Shiyu Zhou Bo Xu

There are several domains that own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models usually pre-trained on large amounts of unlabeled data by self-supervision can be effectively applied to downstream tasks. In the speech domain, wav2vec2.0 starts show its powerful representation ability feasibility ultra-low resource recognition Librispeech corpus, which belongs audiobook domain. However, has not been examined real spoken scenarios languages other...

10.48550/arxiv.2012.12121 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Novel SrTiO3-based ceramics with colossal permittivity, low dielectric loss, and high insulation resistivity via defect engineering

OPENALEX - Publications

Bo Wang Yongping Pu Lei Zhang Yangchao Shang Jiamin Ji and 4 more

10.1016/j.ceramint.2022.05.227 article EN Ceramics International 2022-05-21

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition

OPENALEX - Publications

Cheng Yi Shiyu Zhou Bo Xu

End-to-end models have achieved impressive results on the task of automatic speech recognition (ASR). For low-resource ASR tasks, however, labeled data can hardly satisfy demand end-to-end models. Self-supervised acoustic pre-training has already shown its amazing performance, while transcription is still inadequate for language modeling in In this work, we fuse a pre-trained encoder (wav2vec2.0) and linguistic (BERT) into an model. The fused model only needs to learn transfer from during...

10.1109/lsp.2021.3071668 article EN IEEE Signal Processing Letters 2021-01-01

Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

OPENALEX - Publications

Minglun Han Linhao Dong Zhenlin Liang Meng Cai Shiyu Zhou and 2 more

Nowadays, most methods for end-to-end contextual speech recognition bias the process towards knowledge. Since all-neural biasing rely on phrase-level modeling and attention-based relevance modeling, they may suffer from confusion between similar context-specific phrases, which hurts predictions at token level. In this work, we focus mitigating problems with fine-grained knowledge selection (FineCoS). FineCoS, introduce to reduce uncertainty of predictions. Specifically, first apply phrase...

10.1109/icassp43922.2022.9747101 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

OPENALEX - Publications

Shiyu Zhou Linhao Dong Shuang Xu Bo Xu

Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, new sequence-to-sequence relying entirely self-attention without using RNNs or convolutions, achieves single-model state-of-the-art BLEU machine translation (NMT) tasks. Since outstanding performance of we extend it to concentrate as...

10.48550/arxiv.1804.10752 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Speaker-Aware Speech-Transformer

OPENALEX - Publications

Zhiyun Fan Li Jie Shiyu Zhou Bo Xu

Recently, end-to-end (E2E) models become a competitive alternative to the conventional hybrid automatic speech recognition (ASR) systems. However, they still suffer from speaker mismatch in training and testing condition. In this paper, we use Speech-Transformer (ST) as study platform investigate aware of E2E models. We propose model called Speaker-Aware (SAST), which is standard ST equipped with attention module (SAM). The SAM has static knowledge block (SKB) that made i-vectors. At each...

10.1109/asru46091.2019.9003844 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019-12-01

Potato Yield Prediction Research Based on Improved Artificial Neural Networks Using Whale Optimization Algorithm

OPENALEX - Publications

Lei Xue Xueguo Xu Shiyu Zhou

10.1007/s11540-024-09819-9 article EN Potato Research 2024-10-12

Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin

OPENALEX - Publications

Linhao Dong Shiyu Zhou Wei Chen Bo Xu

End-to-end models have been showing superiority in Automatic Speech Recognition (ASR).At the same time, capacity of streaming recognition has become a growing requirement for end-to-end models.Following these trends, an encoder-decoder recurrent neural network called Recurrent Neural Aligner (RNA) freshly proposed and shown its competitiveness on two English ASR tasks.However, it is not clear if RNA can be further improved applied to other spoken language.In this work, we explore...

10.21437/interspeech.2018-1086 article EN Interspeech 2022 2018-08-28

Unsupervised pre-training for sequence to sequence speech recognition

OPENALEX - Publications

Zhiyun Fan Shiyu Zhou Bo Xu

This paper proposes a novel approach to pre-train encoder-decoder sequence-to-sequence (seq2seq) model with unpaired speech and transcripts respectively. Our pre-training method is divided into two stages, named acoustic pre-trianing linguistic pre-training. In the stage, we use large amount of encoder by predicting masked feature chunks its context. generate synthesized from number using single-speaker text (TTS) system, paired data decoder. two-stage integrates rich knowledge seq2seq...

10.48550/arxiv.1910.12418 preprint EN other-oa arXiv (Cornell University) 2019-01-01

A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition

OPENALEX - Publications

Linhao Dong Cheng Yi Jianzong Wang Shiyu Zhou Shuang Xu and 2 more

End-to-end models are gaining wider attention in the field of automatic speech recognition (ASR). One their advantages is simplicity building that directly recognizes frame sequence into text label by neural networks. According to driving end process, end-to-end ASR could be categorized two types: label-synchronous and frame-synchronous, each which has unique model behaviour characteristic. In this work, we make a detailed comparison on representative (transformer) soft frame-synchronous...

10.48550/arxiv.2005.10113 preprint EN other-oa arXiv (Cornell University) 2020-01-01

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

OPENALEX - Publications

Jing Liu Xinxin Zhu Fei Liu Longteng Guo Zijia Zhao and 6 more

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal understanding and generation, by jointly modeling visual, text audio resources. OPT is constructed in encoder-decoder framework, including three single-modal encoders to generate token-based embeddings each modality, a encoder encode the correlations among modalities, two decoders image respectively. For OPT's pre-training, design multi-task pretext learning scheme model multi-modal resources from different data...

10.48550/arxiv.2107.00249 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition

OPENALEX - Publications

Shiyu Zhou Yuanyuan Zhao Shuang Xu Bo Xu

The shared-hidden-layer multilingual deep neural network (SHL-MDNN), in which the hidden layers of feed-forward (DNN) are shared across multiple languages while softmax language dependent, has been shown to be effective on acoustic modeling low-resource speech recognition. In this paper, we propose that with Long Short-Term Memory (LSTM) recurrent networks can achieve further performance improvement considering LSTM outperformed DNN as model automatic recognition (ASR). Moreover, reveal...

10.21437/interspeech.2017-111 article EN Interspeech 2022 2017-08-16

Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition

OPENALEX - Publications

Cheng Yi Jianzong Wang Ning Cheng Shiyu Zhou Bo Xu

Recently, there are several domains that have their own feature extractors, such as ResNet, BERT, and GPT-x, which widely used for various down-stream tasks. These models pre-trained on large amounts of unlabeled data by self-supervision. In the speech domain, wav2vec2.0 starts to show its powerful representation ability feasibility ultra-low resource recognition This extractor is monolingual audiobook corpus, whereas it has not been thoroughly examined in real spoken scenarios other...

10.1109/ijcnn52387.2021.9533587 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2021-07-18

Cif-Based Collaborative Decoding for End-to-End Contextual Speech Recognition

OPENALEX - Publications

Minglun Han Linhao Dong Shiyu Zhou Bo Xu

End-to-end (E2E) models have achieved promising results on multiple speech recognition benchmarks, and shown the potential to become mainstream. However, unified structure E2E training hamper injecting context information into them for contextual biasing. Though LAS (CLAS) gives an excellent all-neural solution, degree of biasing given is not explicitly controllable. In this paper, we focus incorporating continuous integrate-and-fire (CIF) based model that supports in a more controllable...

10.1109/icassp39728.2021.9415054 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

A multi-class fundus disease classification system based on an adaptive scale discriminator and hybrid loss

OPENALEX - Publications

Shiyu Zhou Jue Wang Bo Li

10.1016/j.compbiolchem.2024.108241 article EN Computational Biology and Chemistry 2024-10-11

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

OPENALEX - Publications

Shiyu Zhou Linhao Dong Shuang Xu Bo Xu

The choice of modeling units is critical to automatic speech recognition (ASR) tasks. Conventional ASR systems typically choose context-dependent states (CD-states) or phonemes (CD-phonemes) as their units. However, it has been challenged by sequence-to-sequence attention-based models, which integrate an acoustic, pronunciation and language model into a single neural network. On English tasks, previous attempts have already shown that the unit graphemes can outperform model. In this paper,...

10.48550/arxiv.1805.06239 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Exploring wav2vec 2.0 on speaker verification and language identification

OPENALEX - Publications

Zhiyun Fan Meng Li Shiyu Zhou Bo Xu

Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows two-stage training process of pre-training and fine-tuning, performs well in recognition tasks especially ultra-low resource cases. In this work, we attempt to extend speaker verification language identification. First, use some preliminary experiments indicate that wav2vec can capture the information about language. Then demonstrate effectiveness on two respectively. For verification,...

10.48550/arxiv.2012.06185 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Coming Soon ...