- Speech and Audio Processing
- Music and Audio Processing
- Speech Recognition and Synthesis
- Historical and Environmental Studies
- Italian Literature and Culture
- Advanced Adaptive Filtering Techniques
- Linguistic Studies and Language Acquisition
- Underwater Acoustics Research
- Music Technology and Sound Studies
- Acoustic Wave Phenomena Research
- Hearing Loss and Rehabilitation
- Spanish Literature and Culture Studies
- Italian Fascism and Post-war Society
- Marine animal studies overview
- Digital Media Forensic Detection
- Advanced Data Compression Techniques
- Underwater Vehicles and Communication Systems
- Libraries, Manuscripts, and Books
- Early Modern Spanish Literature
- Image and Signal Denoising Methods
- Video Analysis and Summarization
- Diverse academic and cultural studies
Apple (United Kingdom)
2024
Politecnico di Milano
2018-2023
Abstract Several methods for synthetic audio speech generation have been developed in the literature through years. With great technological advances brought by deep learning, many novel techniques achieving incredible realistic results recently proposed. As these generate convincing fake human voices, they can be used a malicious way to negatively impact on today’s society (e.g., people impersonation, news spreading, opinion formation). For this reason, ability of detecting whether...
In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation reliability. Several factors have facilitated the growing threat. On one hand, hyper-connected society of social mass media enables spread multimedia content worldwide in real-time, facilitating dissemination counterfeit material. other neural network-based techniques made deepfakes easier to produce difficult detect, showing that analysis low-level features is no longer...
Manipulating speech audio recordings through splicing is a task within everyone's reach. Indeed, it very easy to collect social media multiple from well-known public figures (e.g., actors, politicians, etc.). These can be cut into smaller excerpts that concatenated in order generate new content. As fake famous person used for news spreading and negatively impact on the society, ability of detecting whether recording has been manipulated great interest forensics community. In this work, we...
Nowadays, a great part of music consumption on streaming services are based playlists. Playlists still mainly manually generated by expert curators, or users, process that in several cases is not feasible with huge amount to deal with. There the need effective automatic playlist generation techniques. Traditional approaches problem building sequence pieces satisfies some defined criteria. However, being highly subjective procedure, define an a-priori criterion can be hard task cases. In this...
We propose a denoising methodology for spatial audio recordings acquired with spherical microphone arrays and encoded Higher Order Ambisonics (HOA). The goal is to suppress the noise field impinging on array, while preserving full spatiality of desired soundfield, produced by an acoustic source interest within recording environment. proposed solution consists three steps, carried out in harmonic domain. After estimating direction arrival source, signal extracted means superdirective...
The possibility of manipulating digital multimedia material is nowadays within everyone's reach. In the audio case, anybody can create fake synthetic speech tracks using various methods with almost no effort <xref ref-type="bibr" rid="ref1" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">[1]</xref> . These range from simple waveform concatenation operations to more complex neural networks rid="ref2"...
Underwater robots emit sound during operations which can deteriorate the quality of acoustic data recorded by on-board sensors or disturb marine fauna in vivo observations. Notwithstanding this, there have only been a few attempts at characterizing emissions underwater literature, and datasheets commercially available devices do not report information on this topic. This work has twofold goal. First, we identified setup consisting camera directly mounted robot structure to acquire two...
The increased availability of musical content comes with the need novel paradigms for recommendation, browsing and retrieval from large music libraries. Most players streaming services propose a paradigm based on listing meta-data information, which provides little insight content. In huge catalogs songs, more informative is needed. this work we framework navigation into three-dimensional (3-D) space, where items are placed as 3-D mapping their high-level semantic descriptors. We conducted...
Being able to monitor communications through environmental recordings is an important asset for a forensic investigator, e.g., prevent terrorist attacks. On one hand, this becoming easier thanks the availability of cheaper and smaller audio devices. other automatic analysis huge corpora recording still far from being easy task. In paper we propose method analyze speech establish how reliable they are in terms transcription capability. This can be used automatically select relevant...
We study the problem of stereo singing voice cancellation, a subtask music source separation, whose goal is to estimate an instrumental background from mix. explore how achieve performance similar large state-of-the-art separation networks starting small, efficient model for real-time speech separation. Such useful when memory and compute are limited processing has run with look-ahead. In practice, this realised by adapting existing mono handle input. Improvements in quality obtained tuning...
We study the problem of stereo singing voice cancellation, a subtask music source separation, whose goal is to estimate an instrumental background from mix. explore how achieve performance similar large state-of-the-art separation networks starting small, efficient model for real-time speech separation. Such useful when memory and compute are limited processing has run with look-ahead. In practice, this realised by adapting existing mono handle input. Improvements in quality obtained tuning...
Speech audio acquisitions exhibit different quality and reverberation properties depending on the recording setup environment. For this reason, it is expected that speech analysis systems work correctly certain recordings may fail others acquired in acoustic contexts. Therefore, to be able tell whether a track under shares same characteristics of reference one useful understand if can successfully processed by given system. Alternatively, forensic scenario, an estimate parameter similarity...
The increased availability of musical content comes with the need novel paradigms for recommendation, browsing and retrieval from large music libraries. Most players streaming services propose a paradigm based on listing meta-data information, which provides little insight content. In huge catalogs songs, more informative is needed. this work we framework navigation into three-dimensional (3-D) space, where items are placed as 3-D mapping their high-level semantic descriptors. We conducted...
The rapid spread of media content synthesis technology and the potentially damaging impact audio video deepfakes on people's lives have raised need to implement systems able detect these forgeries automatically. In this work we present a novel approach for synthetic speech detection, exploiting combination two high-level semantic properties human voice. On one side, focus speaker identity cues represent them as embeddings extracted using state-of-the-art method automatic verification task....