Mark D. Plumbley

ORCID: 0000-0002-9708-1075
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Music and Audio Processing
  • Speech and Audio Processing
  • Music Technology and Sound Studies
  • Blind Source Separation Techniques
  • Speech Recognition and Synthesis
  • Sparse and Compressive Sensing Techniques
  • Neural Networks and Applications
  • Advanced Adaptive Filtering Techniques
  • Image and Signal Denoising Methods
  • Video Analysis and Summarization
  • Animal Vocal Communication and Behavior
  • Neuroscience and Music Perception
  • Neural dynamics and brain function
  • Diverse Musicological Studies
  • Natural Language Processing Techniques
  • Subtitles and Audiovisual Media
  • Hearing Loss and Rehabilitation
  • Anomaly Detection Techniques and Applications
  • Structural Health Monitoring Techniques
  • Acoustic Wave Phenomena Research
  • Control Systems and Identification
  • Digital Media Forensic Detection
  • Spectroscopy and Chemometric Analyses
  • Underwater Acoustics Research
  • Phonocardiography and Auscultation Techniques

University of Surrey
2016-2025

Signal Processing (United States)
2017-2025

Institute of Acoustics
2024

Chinese Academy of Sciences
2024

Hôpital Larrey
2024

Tampere University
2023

Queen Mary University of London
2008-2021

Institute of Electrical and Electronics Engineers
2021

Pontifical Catholic University of Rio de Janeiro
2021

Universidade Federal de Santa Catarina
2021

Audio pattern recognition is an important research topic in the machine learning area, and includes several tasks such as audio tagging, acoustic scene classification, music speech emotion classification sound event detection. Recently, neural networks have been applied to tackle problems. However, previous systems are built on specific datasets with limited durations. computer vision natural language processing, pretrained large-scale generalized well tasks. there pretraining for...

10.1109/taslp.2020.3030497 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2020-01-01

Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many unaware tools practices that would allow them write more reliable maintainable code with less effort. We describe set best for scientific software development have solid foundations in research experience, improve scientists' productivity the reliability their

10.1371/journal.pbio.1001745 article EN cc-by PLoS Biology 2014-01-07

For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research this field we conducted a public challenge: IEEE Audio Acoustic Signal Processing Technical Committee challenge on Detection Classification Scenes Events (DCASE). In paper, report state art automatically classifying scenes, detecting events. We...

10.1109/tmm.2015.2428998 article EN cc-by IEEE Transactions on Multimedia 2015-05-01

In this article we present an account of the state-of-the-art in acoustic scene classification (ASC), task classifying environments from sounds they produce. Starting a historical review previous research area, define general framework for ASC and different imple- mentations its components. We then describe range algorithms submitted data challenge that was held to provide fair benchmark techniques. The dataset recorded purpose is presented, along with performance metrics are used evaluate...

10.1109/msp.2014.2326181 article EN IEEE Signal Processing Magazine 2015-04-03

Public evaluation campaigns and datasets promote active development in target research areas, allowing direct comparison of algorithms. The second edition the challenge on detection classification acoustic scenes events (DCASE 2016) has offered such an opportunity for state-of-the-art methods, succeeded drawing together a large number participants from academic industrial backgrounds. In this paper, we report tasks outcomes DCASE 2016 challenge. comprised four tasks: scene classification,...

10.1109/taslp.2017.2778423 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2017-11-30

Automatic species classification of birds from their sound is a computational tool increasing importance in ecology, conservation monitoring and vocal communication studies. To make useful practice, it crucial to improve its accuracy while ensuring that can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent manually-designed summary spectral information. However, recent...

10.7717/peerj.488 article EN cc-by PeerJ 2014-07-17

In this paper, we present a gated convolutional neural network and temporal attention-based localization method for audio classification, which won the 1st place in large-scale weakly supervised sound event detection task of Detection Classification Acoustic Scenes Events (DCASE) 2017 challenge. The clips task, are extracted from YouTube videos, manually labelled with one or more tags, but without time stamps events, hence referred to as data. Two subtasks defined challenge including tagging...

10.1109/icassp.2018.8461975 article EN 2018-04-01

Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a that is built latent space learn the continuous representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us train LDMs embedding while providing as condition during sampling. By...

10.48550/arxiv.2301.12503 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The advancement of audio-language (AL) multimodal learning tasks has been significant in recent years, yet the limited size existing datasets poses challenges for researchers due to costly and time-consuming collection process.To address this data scarcity issue, we introduce WavCaps, first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k clips with paired captions.We sourced their raw descriptions from web sources a sound event detection dataset.However,...

10.1109/taslp.2024.3419446 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration specific objectives biases that can significantly differ from those other types. To bring us closer to a unified perspective generation, this paper proposes holistic framework utilizes the same learning method effect generation. Our general representation called "language audio" (LOA). Any be translated into...

10.1109/taslp.2024.3399607 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

We consider the task of solving independent component analysis (ICA) problem x=As given observations x, with a constraint nonnegativity source random vector s. refer to this as nonnegative and we methods for task. For sources nonzero probability density function (pdf) p(s) down s=0 it is sufficient find orthonormal rotation y=Wz prewhitened z=Vx, which minimizes mean squared error reconstruction z from rectified version y/sup +/ y. suggest some algorithms perform this, both based on...

10.1109/tnn.2003.810616 article EN IEEE Transactions on Neural Networks 2003-05-01

We propose the audio inpainting framework that recovers portions of data distorted due to impairments such as impulsive noise, clipping, and packet loss. In this framework, are treated missing their location is assumed be known. The signal decomposed into overlapping time-domain frames restoration problem then formulated an inverse per frame. Sparse representation modeling employed frame, each solved using Orthogonal Matching Pursuit algorithm together with a discrete cosine or Gabor...

10.1109/tasl.2011.2168211 article EN IEEE Transactions on Audio Speech and Language Processing 2011-09-22

<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> Sparse representations have proved a powerful tool in the analysis and processing of audio signals already lie at heart popular coding standards such as MP3 Dolby AAC. In this paper we give an overview number current emerging applications sparse areas from coding, enhancement music transcription to blind source separation solutions that can solve "cocktail party problem." each case will show how...

10.1109/jproc.2009.2030345 article EN Proceedings of the IEEE 2009-11-20

We present a simple and efficient method for beat tracking of musical audio. With the aim replicating human ability tapping in time to music, we formulate our approach using two state model. The first performs tempo induction tracks changes, while second maintains contextual continuity within single hypothesis. Beat times are recovered by passing output an onset detection function through adaptively weighted comb filterbank matrices separately identify period alignment. evaluate tracker both...

10.1109/tasl.2006.885257 article EN IEEE Transactions on Audio Speech and Language Processing 2007-03-01

We investigate the conditions for which nonnegative matrix factorization (NMF) is unique and introduce several theorems can determine whether decomposition in fact or not. The are illustrated by examples showing use of their limitations. have shown that corruption a NMF additive noise leads to noisy estimation noise-free solution. Finally, we stochastic view analyze characterization underlying model will result an with small errors.

10.1155/2008/764206 article EN cc-by Computational Intelligence and Neuroscience 2008-01-01

This paper describes a newly-launched public evaluation challenge on acoustic scene classification and detection of sound events within scene. Systems dealing with such tasks are far from exhibiting human-like performance robustness. Undermining factors numerous: the extreme variability sources interest possibly interfering, presence complex background noise as well room effects like reverberation. The proposed is an attempt to help research community move forward in defining studying...

10.1109/waspaa.2013.6701819 preprint EN 2013-10-01

This paper presents two new algorithms for wideband spectrum sensing at sub-Nyquist sampling rates, both single nodes and cooperative multiple nodes. In single-node sensing, a two-phase algorithm based on compressive is proposed to reduce the computational complexity improve robustness secondary users (SUs). case, signals received SUs exhibit sparsity property that yields low-rank matrix of compressed measurements fusion center. therefore leads completion. addition, are evaluated TV white...

10.1109/tsp.2015.2512562 article EN cc-by IEEE Transactions on Signal Processing 2015-12-25

For dictionary-based decompositions of certain types, it has been observed that there might be a link between sparsity in the dictionary and decomposition. Sparsity also associated with derivation fast efficient learning algorithms. Therefore, this paper we present greedy adaptive algorithm sets out to find sparse atoms for speech signals. The learns on data frames taken from signal. It iteratively extracts frame minimum index, adds matrix. contribution atom is then removed, process...

10.1109/jstsp.2011.2157892 article EN IEEE Journal of Selected Topics in Signal Processing 2011-05-27

This article deals with learning dictionaries for sparse approximation whose atoms are both adapted to a training set of signals and mutually incoherent. To meet this objective, we employ dictionary scheme consisting followed by update add the latter decorrelation step in order reach target mutual coherence level. is accomplished an iterative projection method complemented rotation dictionary. Experiments on musical audio data comparison optimal coherence-constrained directions (MOCOD)...

10.1109/tsp.2013.2245663 article EN IEEE Transactions on Signal Processing 2013-02-06

Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED that many datasets such as Detection and Classification Acoustic Scenes Events (DCASE) are weakly labelled. That is, there only tags for each clip without onset offset times events. We compare segment-wise clip-wise training lacking previous works. propose convolutional neural network transformer (CNN-Transfomer) tagging SED, show CNN-Transformer performs similarly recurrent (CRNN)....

10.1109/taslp.2020.3014737 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2020-01-01

Deep learning techniques have been used recently to tackle the audio source separation problem. In this work, we propose use deep fully convolutional denoising autoencoders (CDAEs) for monaural separation. We as many CDAEs number of sources be separated from mixed signal. Each CDAE is trained separate one and treats other background noise. The main idea allow each learn suitable spectral-temporal filters features its corresponding source. Our experimental results show that perform slightly...

10.1109/globalsip.2017.8309164 article EN 2017-11-01
Coming Soon ...