NFDI4DS | UHH-SEMS - Publication Details

Robin Scheibler

ORCID: 0000-0002-5205-8365

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5020401831

Research Areas

Speech and Audio Processing
Advanced Adaptive Filtering Techniques
Blind Source Separation Techniques
Music and Audio Processing
Speech Recognition and Synthesis
Advancements in Photolithography Techniques
Acoustic Wave Phenomena Research
Indoor and Outdoor Localization Technologies
Direction-of-Arrival Estimation Techniques
Sparse and Compressive Sensing Techniques
Underwater Acoustics Research
Hearing Loss and Rehabilitation
Advanced Wireless Communication Techniques
Numerical Methods and Algorithms
Computer Graphics and Visualization Techniques
Digital Image Processing Techniques
Computational Geometry and Mesh Generation
Advanced Data Compression Techniques
Infant Health and Development
Image and Signal Denoising Methods
VLSI and FPGA Design Techniques
Animal Vocal Communication and Behavior
Model Reduction and Neural Networks
Advanced Numerical Analysis Techniques
Advancements in PLL and VCO Technologies

Line Corporation (Japan)
2020-2023

Tokyo Metropolitan University
2018-2020

École Polytechnique Fédérale de Lausanne
2009-2018

The University of Tokyo
2018

École Normale Supérieure - PSL
2013

IBM Research - Zurich
2012

Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms

OPENALEX - Publications

Robin Scheibler Eric Bezzam Ivan Dokmanić

We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms. The content can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources microphones in 2D 3D rooms; fast C implementation image source model for general polyhedral rooms efficiently generate room impulse responses simulate propagation between receivers;...

10.1109/icassp.2018.8461310 preprint EN 2018-04-01

Independent Vector Analysis with More Microphones Than Sources

OPENALEX - Publications

Robin Scheibler Nobutaka Ono

We extend frequency-domain blind source separation based on independent vector analysis to the case where there are more microphones than sources. The signal is modelled as non-Gaussian sources in a Gaussian background. proposed algorithm parametrization of demixing matrix decreasing number parameters estimate. Furthermore, orthogonal constraints between and background subspaces imposed regularize separation. problem can then be posed constrained likelihood maximization. propose efficient...

10.1109/waspaa.2019.8937080 preprint EN 2019-10-01

Fast and Stable Blind Source Separation with Rank-1 Updates

OPENALEX - Publications

Robin Scheibler Nobutaka Ono

We propose a new algorithm for the blind source separation of acoustic sources. This is an alternative to popular auxiliary function based independent vector analysis using iterative projection (AuxIVA-IP). It optimizes same cost function, but instead alternate updates rows demixing matrix, we sequence rank-1 updates. Remarkably, and unlike previous method, resulting do not require matrix inversion. Moreover, their computational complexity quadratic in number microphones, rather than cubic...

10.1109/icassp40776.2020.9053556 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Raking the Cocktail Party

OPENALEX - Publications

Ivan Dokmanić Robin Scheibler Martin Vetterli

We present the concept of an acoustic rake receiver---a microphone beamformer that uses echoes to improve noise and interference suppression. The idea is well-known in wireless communications; it involves constructively combining different multipath components arrive at receiver antennas. Unlike spread-spectrum signals used communications, speech are not orthogonal their shifts. Therefore, we focus on spatial structure, rather than temporal. Instead explicitly estimating channel, create...

10.1109/jstsp.2015.2415761 article EN IEEE Journal of Selected Topics in Signal Processing 2015-03-23

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

OPENALEX - Publications

Yen‐Ju Lu Xuankai Chang Chenda Li Wangyou Zhang Samuele Cornell and 8 more

This paper presents recent progress on integrating speech separation and enhancement (SSE) into the ESPnet toolkit.Compared with previous ESPnet-SE work, numerous features have been added, including state-of-the-art models their respective training evaluation recipes.Importantly, a new interface has designed to flexibly combine front-ends other tasks, automatic recognition (ASR), translation (ST), spoken language understanding (SLU).To showcase such integration, we performed experiments...

10.21437/interspeech.2022-10727 article EN Interspeech 2022 2022-09-16

Diffusion-Based Generative Speech Source Separation

OPENALEX - Publications

Robin Scheibler Youna Ji Soo-Whan Chung Jaeuk Byun Soyeon Choe and 1 more

We propose DiffSep, a new single channel source separation method based on score-matching of stochastic differential equation (SDE). craft tailored continuous time diffusion-mixing process starting from the separated sources and converging to Gaussian distribution centered their mixture. This formulation lets us apply machinery score-based generative modelling. First, we train neural network approximate score function marginal probabilities process. Then, use it solve reverse SDE that...

10.1109/icassp49357.2023.10095310 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

A Fast Hadamard Transform for Signals With Sublinear Sparsity in the Transform Domain

OPENALEX - Publications

Robin Scheibler Saeid Haghighatshoar Martin Vetterli

A new iterative low complexity algorithm has been presented for computing the Walsh-Hadamard transform (WHT) of an $N$ dimensional signal with a $K$-sparse WHT, where is power two and $K = O(N^α)$, scales sub-linearly in some $0 < α< 1$. Assuming random support model non-zero domain components, reconstructs WHT sample $O(K \log_2(\frac{N}{K}))$, computational $O(K\log_2(K)\log_2(\frac{N}{K}))$ very high probability asymptotically tending to 1. The approach based on subsampling...

10.1109/tit.2015.2404441 article EN IEEE Transactions on Information Theory 2015-02-16

Neural Diarization with Non-Autoregressive Intermediate Attractors

OPENALEX - Publications

Yusuke Fujita Tatsuya Komatsu Robin Scheibler Yusuke Kida Tetsuji Ogawa

End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker problem simultaneously single network. While EEND model can produce all frame-level labels simultaneously, it disregards output label dependency. In this work, we propose novel that introduces dependency between frames. The proposed generates non-autoregressive intermediate at lower layers and conditions subsequent these labels. works in manner, are refined by...

10.1109/icassp49357.2023.10094824 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Fast Independent Vector Extraction by Iterative SINR Maximization

OPENALEX - Publications

Robin Scheibler Nobutaka Ono

We propose fast independent vector extraction (FIVE), a new algorithm that blindly extracts single non-Gaussian source from Gaussian background. The iteratively computes beam-forming weights maximizing the signal-to-interference-and-noise ratio for an approximate noise covariance matrix. demonstrate this procedure minimizes negative log-likelihood of input data according to well-defined probabilistic model. minimization is carried out via auxiliary function technique whereas, unlike related...

10.1109/icassp40776.2020.9053066 preprint EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Joint Dereverberation and Separation With Iterative Source Steering

OPENALEX - Publications

Taishi Nakashima Robin Scheibler Masahito Togami Nobutaka Ono

We propose a new algorithm for joint dereverberation and blind source separation (DR-BSS). Our work builds upon the IRLMA-T framework that applies unified filter combining separation. One drawback of this is it requires several matrix inversions, an operation inherently costly with potential stability issues. leverage recently introduced iterative steering (ISS) updates to two algorithms mitigating issue. Albeit derived from first principles, turns out be natural combination weighted...

10.1109/icassp39728.2021.9413478 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Surrogate Source Model Learning for Determined Source Separation

OPENALEX - Publications

Robin Scheibler Masahito Togami

We propose to learn surrogate functions of universal speech priors for determined blind separation. Deep are highly desirable due their superior modelling power, but not compatible with state-of-the-art independent vector analysis based on majorization-minimization (AuxIVA), since deriving the required function is easy, nor always possible. Instead, we do away exact majorization and directly approximate surrogate. Taking advantage iterative source steering (ISS) updates, back propagate...

10.1109/icassp39728.2021.9414255 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization

OPENALEX - Publications

Robin Scheibler

We propose a new algorithm for blind source separation (BSS) using independent vector analysis (IVA). This is an improvement over the popular auxiliary function based IVA (AuxIVA) with iterative projection (IP) or steering (ISS). introduce <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">iterative adjustment</i> (IPA), where we update one demixing filter and xmlns:xlink="http://www.w3.org/1999/xlink">jointly</i> adjust all other sources along...

10.1109/tsp.2021.3072228 article EN IEEE Transactions on Signal Processing 2021-01-01

The Fukushima inverse problem

OPENALEX - Publications

Marta Martinez-Camara Ivan Dokmanić Juri Ranieri Robin Scheibler Martin Vetterli and 1 more

Knowing what amount of radioactive material was released from Fukushima in March 2011 is crucial to understand the scope consequences. Moreover, it could be used forward simulations obtain accurate maps deposition. But these data are often not publicly available, or questionable quality. We propose estimate emission waveforms by solving an inverse problem. Previous approaches rely on a detailed expert guess how releases appeared, and they produce solution strongly biased this guess. If we...

10.1109/icassp.2013.6638477 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

FRIDA: FRI-based DOA estimation for arbitrary array layouts

OPENALEX - Publications

Hanjie Pan Robin Scheibler Eric Bezzam Ivan Dokmanić Martin Vetterli

In this paper we present FRIDA---an algorithm for estimating directions of arrival multiple wideband sound sources. FRIDA combines multi-band information coherently and achieves state-of-the-art resolution at extremely low signal-to-noise ratios. It works arbitrary array layouts, but unlike the various steered response power subspace methods, it does not require a grid search. leverages recent advances in sampling signals with finite rate innovation. is based on insight that any layout,...

10.1109/icassp.2017.7952744 preprint EN 2017-03-01

Separake: Source Separation with a Little Help from Echoes

OPENALEX - Publications

Robin Scheibler Diego Di Carlo Antoine Deleforge Ivan Dokmanić

It is commonly believed that multipath hurts various audio processing algorithms. At odds with this belief, we show in fact helps sound source separation, even very simple propagation models. Unlike most existing methods, neither ignore the room impulse responses, nor attempt to estimate them fully. We rather assume know positions of a few virtual microphones generated by echoes and how gives us enough spatial diversity get performance boost over anechoic case. improvements for two standard...

10.1109/icassp.2018.8461345 preprint EN 2018-04-01

Sub-Sample Time Delay Estimation via Auxiliary-Function-Based Iterative Updates

OPENALEX - Publications

Kouei Yamaoka Robin Scheibler Nobutaka Ono Yukoh Wakabayashi

We propose an efficient iterative method to estimate a sub-sample time delay between two signals. formulate it as the optimization problem of maximizing generalized cross correlation (GCC) signals in terms continuous parameter. The maximization is carried out with auxiliary function method. First, we prove that when written sum cosines, GCC can be lower bounded at any point by quadratic function. By repeatedly this lower-bound, alternative update algorithm for estimation derived. follow...

10.1109/waspaa.2019.8937259 article EN 2019-10-01

SDR — Medium Rare with Fast Computations

OPENALEX - Publications

Robin Scheibler

We revisit the widely used bss_eval metrics for source separation with an eye out performance. propose a fast algorithm fixing shortcomings of publicly available implementations. First, we show that are fully specified by squared cosine just two angles between estimate and reference subspaces. Second, large linear systems involved. However, they structured, apply iterative method based on conjugate gradient descent. The complexity this step is thus reduced factor quadratic in distortion...

10.1109/icassp43922.2022.9747473 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Spatial Loss for Unsupervised Multi-channel Source Separation

OPENALEX - Publications

Kohei Saijo Robin Scheibler

10.21437/interspeech.2022-274 article EN Interspeech 2022 2022-09-16

Universal Score-based Speech Enhancement with High Content Preservation

OPENALEX - Publications

Robin Scheibler Yusuke Fujita Yuma Shirahata Tatsuya Komatsu

We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean feature extraction diffusion. Our contributions are three-fold. First, make several modifications to network architecture, improving training stability final performance. Second, introduce an loss promote learning high quality features. Third, low-rank adaptation scheme with phoneme fidelity content...

10.21437/interspeech.2024-138 article EN Interspeech 2022 2024-09-01

End-to-End Multi-Speaker ASR with Independent Vector Analysis

OPENALEX - Publications

Robin Scheibler Wangyou Zhang Xuankai Chang Shinji Watanabe Yanmin Qian

We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition. propose a frontend joint source separation and dereverberation based on the independent vector analysis (IVA) paradigm. It uses fast stable iterative steering algorithm together with neural model. Unlike conventional beamforming, number of speakers can be dynamically changed during or after training. The parameters from ASR module model are optimized jointly loss itself. demonstrate competitive...

10.1109/slt54892.2023.10023037 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2023-01-09

MM Algorithms for Joint Independent Subspace Analysis with Application to Blind Single and Multi-Source Extraction

OPENALEX - Publications

Robin Scheibler Nobutaka Ono

In this work, we propose efficient algorithms for joint independent subspace analysis (JISA), an extension of component that deals with parallel mixtures, where not all the components are independent. We derive algorithmic framework JISA based on majorization-minimization (MM) optimization technique (JISA-MM). use a well-known inequality super-Gaussian sources to surrogate function negative log-likelihood observed data. The minimization leads variant hybrid exact-approximate diagonalization...

10.48550/arxiv.2004.03926 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Raking echoes in the time domain

OPENALEX - Publications

Robin Scheibler Ivan Dokmanić Martin Vetterli

The geometry of room acoustics is such that the reverberant signal can be seen as same waveform emitted from multiple locations.In analogy with rake receiver wireless com munications, we propose several beamforming strategies exploit, rather than suppress, this additional spatio-temporal di versity.Unlike earlier work in frequency domain, time do main designs allow to shape impulse response beam former.In particular, control perceptually relevant pa rameters, amount early echoes or length...

10.1109/icassp.2015.7178030 article EN 2015-04-01

Pyramic: Full Stack Open Microphone Array Architecture and Dataset

OPENALEX - Publications

Robin Scheibler Juan Azcarreta René Beuchat Corentin Ferry

In this paper we introduce an open source and reproducible microphone array hardware design anechoic dataset recorded with array. The Pyramic has 48 microphones spread onto six identical modules connected to FPGA-ARM combo. arrangement of the can be reconfigured create a large number geometries. We describe in detail architecture make openly available all necessary files, VHDL code, C libraries together extensive documentation. This effectively enables replicability part or curated...

10.1109/iwaenc.2018.8521337 article EN 2018-09-01

Effectiveness of Inter- and Intra-Subarray Spatial Features for Acoustic Scene Classification

OPENALEX - Publications

Takao Kawamura Yuma Kinoshita Nobutaka Ono Robin Scheibler

In this paper, we investigate the effectiveness of spatial features for acoustic scene classification (ASC) with distributed microphones. Assuming that multiple subarrays, each containing micro-phones, are and synchronized, consider two types generalized cross-correlation phase transform (GCC-PHAT) as features: intra- inter-subarray GCC-PHATs. They obtained from channels within same subarray between different respectively. The log-Mel spectrogram a spectral feature or GCC-PHAT processed in...

10.1109/icassp49357.2023.10096935 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Hardware and software for reproducible research in audio array signal processing

OPENALEX - Publications

Eric Bezzam Robin Scheibler Juan Azcarreta Hanjie Pan Matthieu Simeoni and 5 more

In our demo, we present two hardware platforms for prototyping audio array signal processing. Pyramic is a 48-channel microphone fitted on an FPGA and Compact Six portable with six microphones, closer to the technical constraints of consumer electronics. A browser based interface was developed that allows user interact stream from arrays in real time. The software component this demo Python module implementations basic processing blocks popular techniques like STFT, beamforming, DoA. Both...

10.1109/icassp.2017.8005297 article EN 2017-03-01

Coming Soon ...