NFDI4DS | UHH-SEMS - Publication Details

Zhizheng Wu

ORCID: 0009-0001-1192-9857

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5102765381

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Natural Language Processing Techniques
Phonetics and Phonology Research
Speech and dialogue systems
Topic Modeling
Music Technology and Sound Studies
Hate Speech and Cyberbullying Detection
Advanced Data Compression Techniques
Authorship Attribution and Profiling
Adversarial Robustness in Machine Learning
Law, AI, and Intellectual Property
Voice and Speech Disorders
Sensor Technology and Measurement Systems
Generative Adversarial Networks and Image Synthesis
Video Analysis and Summarization
Linguistics and Cultural Studies
Particle physics theoretical and experimental studies
Advanced Electrical Measurement Techniques
Digital Media Forensic Detection
Advanced Adaptive Filtering Techniques
Bioethics and Human Rights Issues
Marine and Coastal Research
Advancements in PLL and VCO Technologies

Chinese University of Hong Kong, Shenzhen
2022-2025

Shanghai Artificial Intelligence Laboratory
2025

Shenzhen Research Institute of Big Data
2023-2025

Jingdong (China)
2019

University of Edinburgh
2014-2017

Apple (United States)
2017

Nanyang Technological University
2010-2015

Shanghai University
2015

Edinburgh College
2015

National Institute of Informatics
2014

Spoofing and countermeasures for speaker verification: A survey

OPENALEX - Publications

Zhizheng Wu Nicholas Evans Tomi Kinnunen Junichi Yamagishi Federico Alegre and 1 more

10.1016/j.specom.2014.10.005 article EN Speech Communication 2014-11-04

ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge

OPENALEX - Publications

Zhizheng Wu Tomi Kinnunen Nicholas Evans Junichi Yamagishi Cemal Hanilçi and 2 more

An increasing number of independent studies have confirmed the vulnerability automatic speaker verification (ASV) technology to spoofing.However, in comparison that involving other biometric modalities, spoofing and countermeasure research for ASV is still its infancy.A current barrier progress lack standards which impedes results generated by different researchers.The ASVspoof initiative aims overcome this bottleneck through provision standard corpora, protocols metrics support a common...

10.21437/interspeech.2015-462 article EN Interspeech 2022 2015-09-06

Merlin: An Open Source Neural Network Speech Synthesis System

OPENALEX - Publications

Zhizheng Wu Oliver Watts Simon King

We introduce the Merlin speech synthesis toolkit for neural network-based synthesis.The system takes linguistic features as input, and employs networks to predict acoustic features, which are then passed a vocoder produce waveform.Various network architectures implemented, including standard feedforward network, mixture density recurrent (RNN), long short-term memory (LSTM) amongst others.The is Open Source, written in Python, extensible.This paper briefly describes system, provides some...

10.21437/ssw.2016-33 article EN 2016-09-13

Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis

OPENALEX - Publications

Zhizheng Wu Cassia Valentini-Botinhao Oliver Watts Simon King

Deep neural networks (DNNs) use a cascade of hidden representations to enable the learning complex mappings from input output features. They are able learn mapping text-based linguistic features speech acoustic features, and so perform text-to-speech synthesis. Recent results suggest that DNNs can produce more natural synthetic than conventional HMM-based statistical parametric systems. In this paper, we show representation used within DNN be improved through Multi-Task Learning, stacking...

10.1109/icassp.2015.7178814 article EN 2015-04-01

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge

OPENALEX - Publications

Zhizheng Wu Junichi Yamagishi Tomi Kinnunen Cemal Hanilçi Md Sahidullah and 4 more

Concerns regarding the vulnerability of automatic speaker verification (ASV) technology against spoofing can undermine confidence in its reliability and form a barrier to exploitation. The absence competitive evaluations lack common datasets has hampered progress developing effective countermeasures. This paper describes ASV Spoofing Countermeasures (ASVspoof) initiative, which aims fill this void. Through provision dataset, protocols, metrics, ASVspoof promotes sound research methodology...

10.1109/jstsp.2017.2671435 article EN IEEE Journal of Selected Topics in Signal Processing 2017-02-17

The Voice Conversion Challenge 2016

OPENALEX - Publications

Tomoki Toda Ling-Hui Chen Daisuke Saito Fernando Villavicencio Mirjam Wester and 2 more

This paper describes the Voice Conversion Challenge 2016 devised by authors to better understand different voice conversion (VC) techniques comparing their performance on a common dataset.The task of challenge was speaker conversion, i.e., transform identity source into that target while preserving linguistic content.Using dataset consisting 162 utterances for training and 54 evaluation from each 5 speakers, 17 groups working in VC around world developed own systems every combination 25...

10.21437/interspeech.2016-1066 article EN Interspeech 2022 2016-08-29

Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech

OPENALEX - Publications

Tomi Kinnunen Zhizheng Wu Kong Aik Lee Filip Sedlak Eng Siong Chng and 1 more

Voice conversion - the methodology of automatically converting one's utterances to sound as if spoken by another speaker presents a threat for applications relying on verification. We study vulnerability text-independent verification systems against voice attacks using telephone speech. implemented with two types features and nonparallel frame alignment methods five ranging from simple Gaussian mixture models (GMMs) state-of-the-art joint factor analysis (JFA) recognizer. Experiments subset...

10.1109/icassp.2012.6288895 article EN 2012-03-01

Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion

OPENALEX - Publications

Zhizheng Wu Tuomas Virtanen Eng Siong Chng Haizhou Li

We propose a nonparametric framework for voice conversion, that is, exemplar-based sparse representation with residual compensation. In this framework, spectrogram is reconstructed as weighted linear combination of speech segments, called exemplars, which span multiple consecutive frames. The weights are constrained to be avoid over-smoothing, and high-resolution spectra employed in the exemplars directly without dimensionality reduction maintain spectral details. addition, compression...

10.1109/taslp.2014.2333242 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2014-06-25

Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition

OPENALEX - Publications

Zhizheng Wu Eng Siong Chng Haizhou Li

Voice conversion techniques present a threat to speaker verification systems.To enhance the security of systems, We study how automatically distinguish natural speech and synthetic/converted speech.Motivated by research on phase spectrum in perception, this study, we propose use features derived from detect converted speech.The are tested under three different training situations detector: a) only Gaussian mixture model (GMM) based data available; b) unit-selection c) no available for...

10.21437/interspeech.2012-465 article EN Interspeech 2022 2012-09-09

A study on replay attack and anti-spoofing for text-dependent speaker verification

OPENALEX - Publications

Zhizheng Wu Sheng Gao Eng Siong Cling Haizhou Li

Replay, which is to playback a pre-recorded speech sample, presents genuine risk automatic speaker verification technology. In this study, we evaluate the vulnerability of text-dependent systems under replay attack using standard benchmarking database, and also propose an anti-spoofing technique safeguard systems. The key idea spoofing detection decide whether presented sample matched any previous stored samples based similarity score. experiments conducted on RSR2015 database showed that...

10.1109/apsipa.2014.7041636 article EN 2014-12-01

Investigating gated recurrent networks for speech synthesis

OPENALEX - Publications

Zhizheng Wu Simon King

Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged a potential acoustic model for statistical parametric speech synthesis (SPSS). The long short-term memory (LSTM) architecture is particularly attractive because it addresses the vanishing gradient problem in standard RNNs, making them easier to train. Although recent studies demonstrated that LSTMs can achieve significantly better performance on SPSS than deep feedforward networks, little known about why....

10.1109/icassp.2016.7472657 article EN 2016-03-01

Synthetic speech detection using temporal modulation feature

OPENALEX - Publications

Zhizheng Wu Xiong Xiao Eng Siong Chng Haizhou Li

Voice conversion and speaker adaptation techniques present a threat to current state-of-the-art verification systems. To prevent such spoofing attack enhance the security of systems, development anti-spoofing distinguish synthetic human speech is necessary. In this study, we continue quest discriminate speech. Motivated by facts that analysis-synthesis operate on frame level make frame-by-frame independence assumption, proposed adopt magnitude/phase modulation features detect from Modulation...

10.1109/icassp.2013.6639067 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

A study of speaker adaptation for DNN-based speech synthesis

OPENALEX - Publications

Zhizheng Wu Paweł Świętojański Christophe Veaux Steve Renals Simon King

A major advantage of statistical parametric speech synthesis (SPSS) over unit-selection is its adaptability and controllability in changing speaker characteristics speaking style. Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results. However, the DNNs has not been systematically studied. In this paper, we conduct an experimental analysis adaptation DNN-based at different levels. particular, augment a low-dimensional...

10.7488/ds/259 article EN 2015-09-06

Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance

OPENALEX - Publications

Zhizheng Wu Phillip L. De León Cenk Demiroğlu Ali Khodabakhsh Simon King and 6 more

In this paper, we present a systematic study of the vulnerability automatic speaker verification to diverse range spoofing attacks. We start with thorough analysis effects five speech synthesis and eight voice conversion systems, three systems under those then introduce number countermeasures prevent attacks from both known unknown attackers. Known attackers are whose output was used train countermeasures, while an attacker is system not available during training. Finally, benchmark against...

10.1109/taslp.2016.2526653 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2016-02-08

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

OPENALEX - Publications

Zeqian Ju Yuancheng Wang Kai Shen Xu Tan Detai Xin and 14 more

While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering intricately encompasses various attributes (e.g., content, prosody, timbre, acoustic details) that pose challenges for generation, a natural idea is to factorize into individual subspaces representing different generate them individually. Motivated by it, we propose NaturalSpeech 3, TTS system with novel factorized diffusion...

10.48550/arxiv.2403.03100 preprint EN arXiv (Cornell University) 2024-03-05

Conditional restricted Boltzmann machine for voice conversion

OPENALEX - Publications

Zhizheng Wu Eng Siong Chng Haizhou Li

The conventional statistical-based transformation functions for voice conversion have been shown to suffer over-smoothing and over-fitting problems. problem arises because of the statistical average during estimating model parameters function. In addition, large number in cannot be well estimated from limited parallel training data, which will result problem. this work, we investigate a robust function using conditional restricted Boltzmann machine. Conditional machine, performs linear...

10.1109/chinasip.2013.6625307 article EN 2013-07-01

From HMMS to DNNS: Where do the improvements come from?

OPENALEX - Publications

Oliver Watts Gustav Eje Henter Thomas Merritt Zhizheng Wu Simon King

Deep neural networks (DNNs) have recently been the focus of much text-to-speech research as a replacement for decision trees and hidden Markov models (HMMs) in statistical parametric synthesis systems. Performance improvements reported; however, configuration systems evaluated makes it impossible to judge how improvement is due new machine learning methods, other novel aspects Specifically, whereas HMM-based typically operate at state-level, separate are used handle acoustic streams, most...

10.1109/icassp.2016.7472730 article EN 2016-03-01

Joint Speaker Verification and Antispoofing in the <inline-formula> <tex-math notation="LaTeX">$i$ </tex-math></inline-formula>-Vector Space

OPENALEX - Publications

А. С. Сизов Elie Khoury Tomi Kinnunen Zhizheng Wu Sébastien Marcel

Any biometric recognizer is vulnerable to spoofing attacks and hence voice biometric, also called automatic speaker verification (ASV), no exception; replay, synthesis, conversion all provoke false acceptances unless countermeasures are used. We focus on (VC) considered as one of the most challenging for modern recognition systems. To detect spoofing, existing assume explicit or implicit knowledge a particular VC system designing discriminative features. In this paper, we explore back-end...

10.1109/tifs.2015.2407362 article EN IEEE Transactions on Information Forensics and Security 2015-02-26

SAS: A speaker verification spoofing database containing diverse attacks

OPENALEX - Publications

Zhizheng Wu Ali Khodabakhsh Cenk Demiroğlu Junichi Yamagishi Daisuke Saito and 2 more

This paper presents the first version of a speaker verification spoofing and anti-spoofing database, named SAS corpus. The corpus includes nine techniques, two which are speech synthesis, seven voice conversion. We design protocols, one for standard evaluation, other producing materials. Hence, they allow synthesis community to produce materials incrementally without knowledge anti-spoofing. To provide set preliminary results, we conducted experiments using state-of-the-art systems. Without...

10.1109/icassp.2015.7178810 article EN 2015-04-01

Analysis of the Voice Conversion Challenge 2016 Evaluation Results

OPENALEX - Publications

Mirjam Wester Zhizheng Wu Junichi Yamagishi

The Voice Conversion Challenge 2016 is the first in which different voice conversion systems and approaches using same data were compared.This paper describes design of evaluation, it presents results statistical analyses results.

10.21437/interspeech.2016-1331 article EN Interspeech 2022 2016-08-28

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System

OPENALEX - Publications

Tim Capes Paul Coles Alistair Conkie Ladan Golipour Abie Hadjitarkhani and 13 more

10.21437/interspeech.2017-1798 article EN Interspeech 2022 2017-08-16

A study of speaker adaptation for DNN-based speech synthesis

OPENALEX - Publications

Zhizheng Wu Paweł Świętojański Christophe Veaux Steve Renals Simon King

A major advantage of statistical parametric speech synthesis (SPSS) over unit-selection is its adaptability and controllability in changing speaker characteristics speaking style.Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results.However, the DNNs has not been systematically studied.In this paper, we conduct an experimental analysis adaptation DNN-based at different levels.In particular, augment a low-dimensional...

10.21437/interspeech.2015-270 article EN Interspeech 2022 2015-09-06

Coming Soon ...