NFDI4DS | UHH-SEMS - Publication Details

Synthetic speech detection through short-term and long-term prediction traces

OPENALEX - Publications

Clara Borrelli Paolo Bestagini Fabio Antonacci Augusto Sarti Stefano Tubaro

Abstract Several methods for synthetic audio speech generation have been developed in the literature through years. With great technological advances brought by deep learning, many novel techniques achieving incredible realistic results recently proposed. As these generate convincing fake human voices, they can be used a malicious way to negatively impact on today’s society (e.g., people impersonation, news spreading, opinion formation). For this reason, ability of detecting whether...

10.1186/s13635-021-00116-3 article EN cc-by EURASIP Journal on Information Security 2021-04-06

Deepfake Speech Detection Through Emotion Recognition: A Semantic Approach

OPENALEX - Publications

Emanuele Conti Davide Salvi Clara Borrelli Brian Hosler Paolo Bestagini and 4 more

In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation reliability. Several factors have facilitated the growing threat. On one hand, hyper-connected society of social mass media enables spread multimedia content worldwide in real-time, facilitating dissemination counterfeit material. other neural network-based techniques made deepfakes easier to produce difficult detect, showing that analysis low-level features is no longer...

10.1109/icassp43922.2022.9747186 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Speech Audio Splicing Detection and Localization Exploiting Reverberation Cues

OPENALEX - Publications

Davide Capoferri Clara Borrelli Paolo Bestagini Fabio Antonacci Augusto Sarti and 1 more

Manipulating speech audio recordings through splicing is a task within everyone's reach. Indeed, it very easy to collect social media multiple from well-known public figures (e.g., actors, politicians, etc.). These can be cut into smaller excerpts that concatenated in order generate new content. As fake famous person used for news spreading and negatively impact on the society, ability of detecting whether recording has been manipulated great interest forensics community. In this work, we...

10.1109/wifs49906.2020.9360900 article EN 2020-12-06

Automatic playlist generation using Convolutional Neural Networks and Recurrent Neural Networks

OPENALEX - Publications

Rosilde Tatiana Irene Clara Borrelli Massimiliano Zanoni Michele Buccoli Augusto Sarti

Nowadays, a great part of music consumption on streaming services are based playlists. Playlists still mainly manually generated by expert curators, or users, process that in several cases is not feasible with huge amount to deal with. There the need effective automatic playlist generation techniques. Traditional approaches problem building sequence pieces satisfies some defined criteria. However, being highly subjective procedure, define an a-priori criterion can be hard task cases. In this...

10.23919/eusipco.2019.8903002 article EN 2021 29th European Signal Processing Conference (EUSIPCO) 2019-09-01

A Denoising Methodology for Higher Order Ambisonics Recordings

OPENALEX - Publications

Clara Borrelli Antonio Canclini Fabio Antonacci Augusto Sarti Stefano Tubaro

We propose a denoising methodology for spatial audio recordings acquired with spherical microphone arrays and encoded Higher Order Ambisonics (HOA). The goal is to suppress the noise field impinging on array, while preserving full spatiality of desired soundfield, produced by an acoustic source interest within recording environment. proposed solution consists three steps, carried out in harmonic domain. After estimating direction arrival source, signal extracted means superdirective...

10.1109/iwaenc.2018.8521364 article EN 2018-09-01

Synthetic Speech Attribution: Highlights From the IEEE Signal Processing Cup 2022 Student Competition [SP Competitions]

OPENALEX - Publications

Davide Salvi Clara Borrelli Paolo Bestagini Fabio Antonacci Matthew C. Stamm and 2 more

The possibility of manipulating digital multimedia material is nowadays within everyone's reach. In the audio case, anybody can create fake synthetic speech tracks using various methods with almost no effort <xref ref-type="bibr" rid="ref1" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">[1]</xref> . These range from simple waveform concatenation operations to more complex neural networks rid="ref2"...

10.1109/msp.2023.3268823 article EN IEEE Signal Processing Magazine 2023-09-01

A Minimal Metric for the Characterization of Acoustic Noise Emitted by Underwater Vehicles

OPENALEX - Publications

Giacomo Picardi Clara Borrelli Augusto Sarti Giovanni Chimienti Marcello Calisti

Underwater robots emit sound during operations which can deteriorate the quality of acoustic data recorded by on-board sensors or disturb marine fauna in vivo observations. Notwithstanding this, there have only been a few attempts at characterizing emissions underwater literature, and datasheets commercially available devices do not report information on this topic. This work has twofold goal. First, we identified setup consisting camera directly mounted robot structure to acquire two...

10.3390/s20226644 article EN cc-by Sensors 2020-11-20

Three-Dimensional Mapping of High-Level Music Features for Music Browsing

OPENALEX - Publications

Stefano Cherubin Clara Borrelli Massimiliano Zanoni Michele Buccoli Augusto Sarti and 1 more

The increased availability of musical content comes with the need novel paradigms for recommendation, browsing and retrieval from large music libraries. Most players streaming services propose a paradigm based on listing meta-data information, which provides little insight content. In huge catalogs songs, more informative is needed. this work we framework navigation into three-dimensional (3-D) space, where items are placed as 3-D mapping their high-level semantic descriptors. We conducted...

10.1109/mmrp.2019.00013 article EN 2019-01-01

Automatic Reliability Estimation for Speech Audio Surveillance Recordings

OPENALEX - Publications

Clara Borrelli Paolo Bestagini Fabio Antonacci Augusto Sarti Stefano Tubaro

Being able to monitor communications through environmental recordings is an important asset for a forensic investigator, e.g., prevent terrorist attacks. On one hand, this becoming easier thanks the availability of cheaper and smaller audio devices. other automatic analysis huge corpora recording still far from being easy task. In paper we propose method analyze speech establish how reliable they are in terms transcription capability. This can be used automatically select relevant...

10.1109/wifs47025.2019.9034986 article EN 2019-12-01

Resource-constrained stereo singing voice cancellation

OPENALEX - Publications

Clara Borrelli James Rae Doğaç Başaran Matt McVicar Mehrez Souden and 1 more

We study the problem of stereo singing voice cancellation, a subtask music source separation, whose goal is to estimate an instrumental background from mix. explore how achieve performance similar large state-of-the-art separation networks starting small, efficient model for real-time speech separation. Such useful when memory and compute are limited processing has run with look-ahead. In practice, this realised by adapting existing mono handle input. Improvements in quality obtained tuning...

10.48550/arxiv.2401.12068 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Resource-Constrained Stereo Singing Voice Cancellation

OPENALEX - Publications

Clara Borrelli James Rae Doğaç Başaran Matt McVicar Mehrez Souden and 1 more

We study the problem of stereo singing voice cancellation, a subtask music source separation, whose goal is to estimate an instrumental background from mix. explore how achieve performance similar large state-of-the-art separation networks starting small, efficient model for real-time speech separation. Such useful when memory and compute are limited processing has run with look-ahead. In practice, this realised by adapting existing mono handle input. Improvements in quality obtained tuning...

10.1109/icassp48485.2024.10446245 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

A Data-Driven Approach for Acoustic Parameter Similarity Estimation of Speech Recording

OPENALEX - Publications

Mattia Papa Clara Borrelli Paolo Bestagini Fabio Antonacci Augusto Sarti and 1 more

Speech audio acquisitions exhibit different quality and reverberation properties depending on the recording setup environment. For this reason, it is expected that speech analysis systems work correctly certain recordings may fail others acquired in acoustic contexts. Therefore, to be able tell whether a track under shares same characteristics of reference one useful understand if can successfully processed by given system. Alternatively, forensic scenario, an estimate parameter similarity...

10.1109/icassp43922.2022.9747043 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Three-Dimensional Mapping of High-Level Music Features for Music Browsing

OPENALEX - Publications

Stefano Cherubin Clara Borrelli Massimiliano Zanoni Michele Buccoli A. Sarti and 1 more

The increased availability of musical content comes with the need novel paradigms for recommendation, browsing and retrieval from large music libraries. Most players streaming services propose a paradigm based on listing meta-data information, which provides little insight content. In huge catalogs songs, more informative is needed. this work we framework navigation into three-dimensional (3-D) space, where items are placed as 3-D mapping their high-level semantic descriptors. We conducted...

10.1109/mmrp.2019.8665368 article EN 2019-01-01

Problemi della Napoli postunitaria nei Vermi di Francesco Mastriani

OPENALEX - Publications

Clara Borrelli

10.1400/176927 article IT 2011-01-01

Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection

OPENALEX - Publications

Luigi Attorresi Davide Salvi Clara Borrelli Paolo Bestagini Stefano Tubaro

The rapid spread of media content synthesis technology and the potentially damaging impact audio video deepfakes on people's lives have raised need to implement systems able detect these forgeries automatically. In this work we present a novel approach for synthetic speech detection, exploiting combination two high-level semantic properties human voice. On one side, focus speaker identity cues represent them as embeddings extracted using state-of-the-art method automatic verification task....

10.48550/arxiv.2210.17222 preprint EN other-oa arXiv (Cornell University) 2022-01-01