NFDI4DS | UHH-SEMS - Publication Details

Eugene Kharitonov

ORCID: 0009-0000-8653-721X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5102924622

Research Areas

Speech Recognition and Synthesis
Natural Language Processing Techniques
Metallurgy and Material Forming
Topic Modeling
Music and Audio Processing
Metal Alloys Wear and Properties
Diverse Industrial Engineering Technologies
Speech and Audio Processing
Information Retrieval and Search Behavior
Language and cultural evolution
Recommender Systems and Techniques
Mobile Crowdsensing and Crowdsourcing
Expert finding and Q&A systems
Ferroelectric and Negative Capacitance Devices
Web Data Mining and Analysis
Domain Adaptation and Few-Shot Learning
Microstructure and Mechanical Properties of Steels
Genomics and Phylogenetic Studies
Reinforcement Learning in Robotics
Machine Learning and Data Classification
Neural Networks and Applications
Algorithms and Data Compression
Engineering Technology and Methodologies
Advanced Memory and Neural Computing
Machine Learning and Algorithms

Google (Switzerland)
2023

Samara State Technical University
2023

Meta (Israel)
2019-2022

École des hautes études en sciences sociales
2022

Institute of Forensic Science
2021

École Normale Supérieure
2021

National University of Science and Technology
2013-2019

National University of Science and Technology
2005-2019

Meta (United States)
2019

University of Glasgow
2012-2015

Libri-Light: A Benchmark for ASR with Limited or No Supervision

OPENALEX - Publications

Jacob Kahn Maude Rivière Wenlong Zheng Eugene Kharitonov Qinmei Xu and 10 more

We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source books the LibriVox project. contains over 60K hours audio, which is, to our knowledge, largest freely-available corpus speech. The has been segmented using voice activity detection and tagged with SNR, speaker ID genre descriptions. Additionally, we provide baseline evaluation metrics working three settings: (1) zero...

10.1109/icassp40776.2020.9052942 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

AudioLM: A Language Modeling Approach to Audio Generation

OPENALEX - Publications

Zalán Borsos Raphaël Marinier Damien Vincent Eugene Kharitonov Olivier Pietquin and 6 more

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input to sequence of discrete tokens and casts as language modeling task in this representation space. show how existing tokenizers provide different trade-offs between reconstruction quality structure, we propose hybrid tokenization scheme achieve both objectives. Namely, leverage discretized activations masked model pre-trained on capture structure codes produced by neural codec...

10.1109/taslp.2023.3288409 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2023-01-01

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

OPENALEX - Publications

Adam Polyak Yossi Adi Jade Copet Eugene Kharitonov Kushal Lakhotia and 3 more

We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate content, prosodic information, and speaker identity. This allows to synthesize in a controllable manner. analyze various state-of-the-art, representation learning methods shed light on advantages each method while considering reconstruction quality disentanglement properties. Specifically, evaluate F0 reconstruction,...

10.21437/interspeech.2021-475 article EN Interspeech 2022 2021-08-27

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

OPENALEX - Publications

Eugene Kharitonov Morgane Rivière Gabriel Synnaeve Lior Wolf Pierre-Emmanuel Mazaré and 2 more

Contrastive Predictive Coding (CPC), based on predicting future segments of speech from past is emerging as a powerful algorithm for representation learning signal. However, it still under-performs compared to other methods unsupervised evaluation benchmarks. Here, we intro-duce WavAugment, time-domain data augmentation library which adapt and optimize the specificities CPC (raw waveform input, contrastive loss, versus structure). We find that applying only prediction performed yields better...

10.1109/slt48900.2021.9383605 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2021-01-19

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

OPENALEX - Publications

Eugene Kharitonov Damien Vincent Zalán Borsos Raphaël Marinier Sertan Girgin and 4 more

Abstract We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as composition sequence-to-sequence tasks: from text to high-level semantic tokens (akin “reading”) and low-level acoustic (“speaking”). Decoupling these tasks enables training the “speaking” module using abundant audio-only data, unlocks highly efficient combination pretraining backtranslation reduce...

10.1162/tacl_a_00618 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

AudioPaLM: A Large Language Model That Can Speak and Listen

OPENALEX - Publications

Paul K. Rubenstein Chulayuth Asawaroengchai Duc Dung Nguyen Ankur Bapna Zalán Borsos and 25 more

We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based speech-based models, PaLM-2 [Anil et al., 2023] AudioLM [Borsos 2022], into unified multimodal architecture that can process generate text with applications including recognition speech-to-speech translation. inherits the capability to preserve paralinguistic information such as speaker identity intonation from linguistic knowledge present only in models PaLM-2. demonstrate...

10.48550/arxiv.2306.12925 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Generative Spoken Dialogue Language Modeling

OPENALEX - Publications

Tu Anh Nguyen Eugene Kharitonov Jade Copet Yossi Adi Wei-Ning Hsu and 6 more

Abstract We introduce dGSLM, the first “textless” model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised unit discovery coupled with a dual-tower transformer architecture cross-attention trained 2000 hours two-channel raw conversational (Fisher dataset) without any text or labels. show that our is speech, laughter, and other paralinguistic signals in two channels simultaneously reproduces more fluid turn taking compared text-based cascaded model.1,2

10.1162/tacl_a_00545 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

SoundStorm: Efficient Parallel Audio Generation

OPENALEX - Publications

Zalán Borsos Matt Sharifi Damien Vincent Eugene Kharitonov Neil Zeghidour and 1 more

We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention confidence-based parallel decoding to generate neural codec. Compared autoregressive generation approach our produces same quality with higher consistency in voice acoustic conditions, while being two orders magnitude faster. generates 30 seconds 0.5 TPU-v4. demonstrate ability scale longer sequences by...

10.48550/arxiv.2305.09636 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Communicating artificial neural networks develop efficient color-naming systems

OPENALEX - Publications

Rahma Chaabouni Eugene Kharitonov Emmanuel Dupoux Marco Baroni

Words categorize the semantic fields they refer to in ways that maximize communication accuracy while minimizing complexity. Focusing on well-studied color domain, we show artificial neural networks trained with deep-learning techniques play a discrimination game develop systems whose distribution accuracy/complexity plane closely matches of human languages. The observed variation among emergent color-naming is explained by different degrees discriminative need, sort might also characterize...

10.1073/pnas.2016569118 article EN cc-by Proceedings of the National Academy of Sciences 2021-03-15

FEATURES OF NATIVE PUBLISHER CONTENT MANAGEMENT

OPENALEX - Publications

Eugene Kharitonov D. V. Kharitonova Yu.V. Skibin

10.17513/vaael.4043 article EN Bulletin of the Altai Academy of Economics and law 2025-01-01

Radial-Shear Rolling of Titanium Alloy VT-8 Bars with Controlled Structure for Small Diameter Ingots (≤200 mm)

OPENALEX - Publications

B. V. Karpov P. V. Patrin С. П. Галкин Eugene Kharitonov I. B. Karpov

10.1007/s11015-018-0581-6 article EN Metallurgist 2018-01-01

EGG: a toolkit for research on Emergence of lanGuage in Games

OPENALEX - Publications

Eugene Kharitonov Rahma Chaabouni Diane Bouchacourt Marco Baroni

Eugene Kharitonov, Rahma Chaabouni, Diane Bouchacourt, Marco Baroni. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP): System Demonstrations. 2019.

10.18653/v1/d19-3010 preprint EN cc-by 2019-01-01

The Zero Resource Speech Challenge 2021: Spoken Language Modelling

OPENALEX - Publications

Ewan Dunbar Mathieu Bernard Nicolas Hamilakis Tu Anh Nguyen Maureen de Seyssel and 4 more

We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on Libri-light dataset, provides up 60k hours of audio English books associated text. provide pipeline baseline system consisting an encoder contrastive predictive coding (CPC), quantizer ($k$-means) and standard (BERT LSTM). metrics evaluate learned representations at acoustic (ABX discrimination), lexical...

10.21437/interspeech.2021-1755 article EN Interspeech 2022 2021-08-27

Anti-efficient encoding in emergent communication

OPENALEX - Publications

Rahma Chaabouni Eugene Kharitonov Emmanuel Dupoux Marco Baroni

Despite renewed interest in emergent language simulations with neural networks, little is known about the basic properties of induced code, and how they compare to human language. One fundamental characteristic latter, as Zipf's Law Abbreviation (ZLA), that more frequent words are efficiently associated shorter strings. We study whether same pattern emerges when two a "speaker" "listener", trained play signaling game. Surprisingly, we find networks develop an \emph{anti-efficient} encoding...

10.48550/arxiv.1905.12561 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Federated Online Learning to Rank with Evolution Strategies

OPENALEX - Publications

Eugene Kharitonov

Online Learning to Rank is a powerful paradigm that allows train ranking models using only online feedback from its users.In this work, we consider Federated setup (FOLtR) where on-mobile are trained in way respects the users' privacy. We require user data, such as queries, results, and their feature representations never communicated for purpose of ranker's training. believe interesting, it combines unique requirements learning algorithm: (a) preserving privacy, (b) low communication...

10.1145/3289600.3290968 article EN 2019-01-30

textless-lib: a Library for Textless Spoken Language Processing

OPENALEX - Publications

Eugene Kharitonov Jade Copet Kushal Lakhotia Tu Anh Nguyen Paden Tomasello and 6 more

Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies: System Demonstrations. 2022.

10.18653/v1/2022.naacl-demo.1 article EN cc-by 2022-01-01

Text-Free Prosody-Aware Generative Spoken Language Modeling

OPENALEX - Publications

Eugene Kharitonov Ann Lee Adam Polyak Yossi Adi Jade Copet and 6 more

Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Riviere, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.593 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Compositionality and Generalization In Emergent Languages

OPENALEX - Publications

Rahma Chaabouni Eugene Kharitonov Diane Bouchacourt Emmanuel Dupoux Marco Baroni

Natural language allows us to refer novel composite concepts by combining expressions denoting their parts according systematic rules, a property known as \emph{compositionality}. In this paper, we study whether the emerging in deep multi-agent simulations possesses similar ability primitive combinations, and it accomplishes feat strategies akin human-language compositionality. Equipped with new ways measure compositionality emergent languages inspired disentanglement representation...

10.18653/v1/2020.acl-main.407 preprint EN 2020-01-01

Sequential Testing for Early Stopping of Online Experiments

OPENALEX - Publications

Eugene Kharitonov Aleksandr Vorobev Craig Macdonald Pavel Serdyukov Iadh Ounis

Online evaluation methods, such as A/B and interleaving experiments, are widely used for search engine evaluation. Since they rely on noisy implicit user feedback, running each experiment takes a considerable time. Recently, the problem of reducing duration online experiments has received substantial attention from research community. However, possibility using sequential statistical testing procedures time required remains less studied. Such allow an to stop early, once data collected is...

10.1145/2766462.2767729 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015-08-04

Textless Speech Emotion Conversion using Discrete & Decomposed Representations

OPENALEX - Publications

Felix Kreuk Adam Polyak Jade Copet Eugene Kharitonov Tu Anh Nguyen and 5 more

Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu Anh Nguyen, Morgan Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.

10.18653/v1/2022.emnlp-main.769 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

User model-based metrics for offline query suggestion evaluation

OPENALEX - Publications

Eugene Kharitonov Craig Macdonald Pavel Serdyukov Iadh Ounis

Query suggestion or auto-completion mechanisms are widely used by search engines and increasingly attracting interest from the research community. However, lack of commonly accepted evaluation methodology metrics means that it is not possible to compare results approaches literature. Moreover, often evaluate query suggestions tend be an adaptation other domains without a proper justification. Hence, necessarily clear if improvements reported in literature would result actual improvement...

10.1145/2484028.2484041 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2013-07-28

Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

OPENALEX - Publications

Rahma Chaabouni Roberto Dessì Eugene Kharitonov

Despite their failure to solve the compositional SCAN dataset, seq2seq architectures still achieve astonishing success on more practical tasks. This observation pushes us question usefulness of SCAN-style generalization in realistic NLP In this work, we study benefit that such compositionality brings about several machine translation We present focused modifications Transformer greatly improve capabilities and select one remains par with a vanilla standard (MT) task. Next, its performance...

10.18653/v1/2021.blackboxnlp-1.9 preprint EN cc-by 2021-01-01

Intent models for contextualising and diversifying query suggestions

OPENALEX - Publications

Eugene Kharitonov Craig Macdonald Pavel Serdyukov Iadh Ounis

The query suggestion or auto-completion mechanisms help users to type less while interacting with a search engine. A basic approach that ranks suggestions according their frequency in logs is suboptimal. Firstly, many candidate queries the same prefix can be removed as redundant. Secondly, also personalised based on user's context. These two directions improve mechanisms' quality opposition: latter aims promote address intents user likely have, former diversify cover possible. We introduce...

10.1145/2505515.2505661 article EN 2013-10-27

Emergent Language Generalization and Acquisition Speed are not tied to Compositionality

OPENALEX - Publications

Eugene Kharitonov Marco Baroni

Studies of discrete languages emerging when neural agents communicate to solve a joint task often look for evidence compositional structure. This stems the expectation that such structure would allow be acquired faster by and enable them generalize better. We argue these beneficial properties are only loosely connected compositionality. In two experiments, we demonstrate that, depending on task, non-compositional might show equal, or better, generalization performance acquisition speed than...

10.18653/v1/2020.blackboxnlp-1.2 article EN cc-by 2020-01-01

Coming Soon ...