NFDI4DS | UHH-SEMS - Publication Details

Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems

OPENALEX - Publications

Florian Kreyssig Iñigo Casanueva Paweł Budzianowski Milica Gašić

User Simulators are one of the major tools that enable offline training task-oriented dialogue systems. For this task Agenda-Based Simulator (ABUS) is often used. The ABUS based on hand-crafted rules and its output in semantic form. Issues arise from both properties such as limited diversity inability to interface a text-level belief tracker. This paper introduces Neural (NUS) whose behaviour learned corpus which generates natural language, hence needing less labelled dataset than simulators...

10.18653/v1/w18-5007 article EN cc-by 2018-01-01

Transferable Dialogue Systems and User Simulators

OPENALEX - Publications

Bo-Hsiang Tseng Yinpei Dai Florian Kreyssig Bill Byrne

Bo-Hsiang Tseng, Yinpei Dai, Florian Kreyssig, Bill Byrne. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.13 article EN cc-by 2021-01-01

Discriminative Neural Clustering for Speaker Diarisation

OPENALEX - Publications

Qiujia Li Florian Kreyssig Chao Zhang Philip C. Woodland

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as supervised sequence-to-sequence learning problem. Com-pared to traditional unsupervised algorithms, DNC learns patterns from training without requiring an explicit definition similarity measure. An implementation based on the Transformer architecture is shown be effective speaker diarisation task using challenging AMI dataset. Since contains only 147 complete...

10.1109/slt48900.2021.9383617 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2021-01-19

Discriminative Neural Clustering for Speaker Diarisation

OPENALEX - Publications

Qiujia Li Florian Kreyssig Chao Zhang Philip C. Woodland

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as supervised sequence-to-sequence learning problem. Compared to traditional unsupervised algorithms, DNC learns patterns from training without requiring an explicit definition similarity measure. An implementation based on the Transformer architecture is shown be effective speaker diarisation task using challenging AMI dataset. Since contains only 147 complete...

10.48550/arxiv.1910.09703 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS

OPENALEX - Publications

Florian Kreyssig C. Zhang Philip C. Woodland

Time delay neural networks (TDNNs) are an effective acoustic model for large vocabulary speech recognition. The strength of the can be attributed to its ability effectively long temporal contexts. However, current TDNN models relatively shallow, which limits modelling capability. This paper proposes a method increasing network depth by deepening kernel used in convolutions. best performing consists three fully connected layers with residual (ResNet) connection from output first third....

10.1109/icassp.2018.8462523 article EN 2018-04-01

PyHTK: Python Library and ASR Pipelines for HTK

OPENALEX - Publications

C. Zhang Florian Kreyssig Qi Li Philip C. Woodland

This paper describes PyHTK, which is a Python-based library and associated pipeline to facilitate the construction of large-scale complex automatic speech recognition (ASR) systems using hidden Markov model toolkit (HTK). PyHTK can be used generate sophisticated artificial neural network (ANN) models with versatile architectures by converting compact configuration file defining ANN, into form HTK tools, as well supporting range capabilities train test ANN models. The ASR divided multiple...

10.1109/icassp.2019.8683851 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-16

Feudal Dialogue Management with Jointly Learned Feature Extractors

OPENALEX - Publications

Iñigo Casanueva Paweł Budzianowski Stefan Ultes Florian Kreyssig Bo-Hsiang Tseng and 2 more

Reinforcement learning (RL) is a promising dialogue policy optimisation approach, but traditional RL algorithms fail to scale large domains. Recently, Feudal Dialogue Management (FDM), has shown increase the scalability domains by decomposing management decision into two steps, making use of domain ontology abstract state in each step. In order space, however, previous work on FDM relies handcrafted feature functions. this work, we show that these functions can be learned jointly with model...

10.18653/v1/w18-5038 article EN cc-by 2018-01-01

Variational Cross-domain Natural Language Generation for Spoken Dialogue Systems

OPENALEX - Publications

Bo-Hsiang Tseng Florian Kreyssig Paweł Budzianowski Iñigo Casanueva Yen-Chen Wu and 2 more

Cross-domain natural language generation (NLG) is still a difficult task within spoken dialogue modelling. Given semantic representation provided by the manager, generator should generate sentences that convey desired information. Traditional template-based generators can produce with all necessary information, but these are not sufficiently diverse. With RNN-based models, diversity of generated be high, however, in process some information lost. In this work, we improve an considering...

10.18653/v1/w18-5039 article EN cc-by 2018-01-01

Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings

OPENALEX - Publications

Florian Kreyssig Philip C. Woodland

In this paper, we propose a semi-supervised learning (SSL) technique for training deep neural networks (DNNs) to generate speaker-discriminative acoustic embeddings (speaker embeddings).Obtaining large amounts of speaker recognition data can be difficult desired target domains, especially under privacy constraints.The proposed reduces requirements labelled by leveraging unlabelled data.The is variant virtual adversarial (VAT) [1] in the form loss that defined as robustness embedding against...

10.21437/interspeech.2020-2270 article EN Interspeech 2022 2020-10-25

A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training

OPENALEX - Publications

Adnan Haider Chao Zhang Florian Kreyssig Philip C. Woodland

10.1016/j.neunet.2021.05.011 article EN Neural Networks 2021-05-29

Biased Self-supervised Learning for ASR

OPENALEX - Publications

Florian Kreyssig Yangyang Shi Jinxi Guo Leda Sarı Abdelrahman Mohamed and 1 more

10.21437/interspeech.2023-2499 article EN Interspeech 2022 2023-08-14

Transferable Dialogue Systems and User Simulators

OPENALEX - Publications

Bo-Hsiang Tseng Yinpei Dai Florian Kreyssig Bill Byrne

One of the difficulties in training dialogue systems is lack data. We explore possibility creating data through interaction between a system and user simulator. Our goal to develop modelling framework that can incorporate new scenarios self-play two agents. In this framework, we first pre-train agents on collection source domain dialogues, which equips converse with each other via natural language. With further fine-tuning small amount target data, continue interact aim improving their...

10.48550/arxiv.2107.11904 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Neural User Simulation for Corpus-based Policy Optimisation for Spoken Dialogue Systems

OPENALEX - Publications

Florian Kreyssig Iñigo Casanueva Paweł Budzianowski Milica Gašić

User Simulators are one of the major tools that enable offline training task-oriented dialogue systems. For this task Agenda-Based Simulator (ABUS) is often used. The ABUS based on hand-crafted rules and its output in semantic form. Issues arise from both properties such as limited diversity inability to interface a text-level belief tracker. This paper introduces Neural (NUS) whose behaviour learned corpus which generates natural language, hence needing less labelled dataset than simulators...

10.48550/arxiv.1805.06966 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Variational Cross-domain Natural Language Generation for Spoken Dialogue Systems

OPENALEX - Publications

Bo-Hsiang Tseng Florian Kreyssig Paweł Budzianowski Iñigo Casanueva Yen-Chen Wu and 2 more

Cross-domain natural language generation (NLG) is still a difficult task within spoken dialogue modelling. Given semantic representation provided by the manager, generator should generate sentences that convey desired information. Traditional template-based generators can produce with all necessary information, but these are not sufficiently diverse. With RNN-based models, diversity of generated be high, however, in process some information lost. In this work, we improve an considering...

10.48550/arxiv.1812.08879 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Improved TDNNs using Deep Kernels and Frequency Dependent Grid-RNNs

OPENALEX - Publications

Florian Kreyssig Chao Zhang Philip C. Woodland

Time delay neural networks (TDNNs) are an effective acoustic model for large vocabulary speech recognition. The strength of the can be attributed to its ability effectively long temporal contexts. However, current TDNN models relatively shallow, which limits modelling capability. This paper proposes a method increasing network depth by deepening kernel used in convolutions. best performing consists three fully connected layers with residual (ResNet) connection from output first third....

10.48550/arxiv.1802.06412 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings

OPENALEX - Publications

Florian Kreyssig Philip C. Woodland

In this paper, we propose a semi-supervised learning (SSL) technique for training deep neural networks (DNNs) to generate speaker-discriminative acoustic embeddings (speaker embeddings). Obtaining large amounts of speaker recognition train-ing data can be difficult desired target domains, especially under privacy constraints. The proposed reduces requirements labelled by leveraging unlabelled data. is variant virtual adversarial (VAT) [1] in the form loss that defined as robustness embedding...

10.48550/arxiv.2008.03756 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Biased Self-supervised learning for ASR

OPENALEX - Publications

Florian Kreyssig Yangyang Shi Jinxi Guo Leda Sarı Abdelrahman Mohamed and 1 more

Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes method to bias self-supervised towards specific task. The core idea is slightly finetune the model that used obtain target sequence. leads better and substantial increase in training speed. Furthermore, this variant MPPT allows low-footprint streaming models be trained effectively by computing loss unmasked frames. These approaches are...

10.48550/arxiv.2211.02536 preprint EN other-oa arXiv (Cornell University) 2022-01-01

A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training

OPENALEX - Publications

Adnan Haider Chao Zhang Florian Kreyssig Philip C. Woodland

This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training that can operate efficiently in distributed manner. It relies on the linear conjugate (CG) algorithm to combine (NG) method with local curvature information from (HF) or other second-order methods. A solution numerical issue CG allows effective parameter updates be generated far fewer iterations than usually used (e.g. 5-8 instead of 200). work also preconditioning approach...

10.48550/arxiv.2103.07554 preprint EN other-oa arXiv (Cornell University) 2021-01-01