Climent Nadeu

ORCID: 0000-0002-5863-0983
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Advanced Data Compression Techniques
  • Indoor and Outdoor Localization Technologies
  • Advanced Adaptive Filtering Techniques
  • Blind Source Separation Techniques
  • Image and Signal Denoising Methods
  • Neural Networks and Applications
  • Natural Language Processing Techniques
  • Phonetics and Phonology Research
  • Time Series Analysis and Forecasting
  • Underwater Acoustics Research
  • Anomaly Detection Techniques and Applications
  • Infant Health and Development
  • Phonocardiography and Auscultation Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Structural Health Monitoring Techniques
  • Video Surveillance and Tracking Methods
  • Mathematical Analysis and Transform Methods
  • Speech and dialogue systems
  • Healthcare Technology and Patient Monitoring
  • Underwater Vehicles and Communication Systems
  • Ultrasonics and Acoustic Wave Propagation
  • Video Analysis and Summarization

Universitat Politècnica de Catalunya
2011-2024

FC Barcelona
1987-2017

Addis Ababa University
2016

National Student Clearinghouse Research Center
2000-2008

Centre Tecnologic de Telecomunicacions de Catalunya
1992

This paper describes the phonetic content of Albayzin, a spoken database for Spanish designed speech recognition purposes. A statistical study large sample spontaneous is presented, and criteria final constitution are discussed. Finally, contents analyzed

10.21437/eurospeech.1993-66 article EN 1993-09-22

10.1016/j.patrec.2009.06.009 article EN Pattern Recognition Letters 2009-07-03

Cepstral coefficients are widely used in speech recognition.In this paper, we claim that they not the best way of representing spectral envelope, at least for some usual recognition systems.In fact, cepstrum has several disadvantages: poor physical meaning, need transformation, and low capacity adaptation to propose a new representation significantly outperforms both mel-cepstrum LPC-cepstrum techniques rate computational cost.It consists filtering frequency sequence filter-bank energies...

10.21437/eurospeech.1995-220 article EN 1995-09-18

In this paper, features which are usually employed in automatic speech recognition (ASR) used for the detection of seizures newborn EEG. particular, spectral envelope-based features, composed powers and their derivatives compared to established feature set has been previously developed EEG analysis. The results indicate that ASR model derivatives, either full-band or localized frequency, yielded a performance improvement, comparison spectral-power-based features. Indeed it is shown here they...

10.1109/titb.2011.2159805 article EN IEEE Transactions on Information Technology in Biomedicine 2011-06-21

The article presents a robust representation of speech based on AR modeling the causal part autocorrelation sequence. In noisy recognition, this new achieves better results than several other related techniques.

10.1109/89.554273 article EN IEEE Transactions on Speech and Audio Processing 1997-01-01

Recently, audio segmentation has attracted research interest because of its usefulness in several applications like indexing and retrieval, subtitling, monitoring acoustic scenes, etc. Moreover, a previous stage may be useful to improve the robustness speech technologies automatic recognition speaker diarization. In this article, we present evaluation broadcast news systems carried out context Albayzín-2010 campaign. That consisted segmenting from 3/24 Catalan TV channel into five classes:...

10.1186/1687-4722-2011-1 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2011-06-17

This work aims at gaining an insight into the mean and variance normalization technique (MVN), which is commonly used to increase robustness of speech recognition features. Several versions MVN are empirically investigated, factors affecting their performance considered. The reported experimental with real-world data (Speecon) particularly focuses on recursive updating parameters, paying attention involved algorithmical delay. First, we propose a decoupling look-ahead factor (which...

10.1109/icassp.2006.1660135 article EN 2006-08-03

Speech activity detection (SAD) is a key objective in speech-related technologies. In this work, an enhanced version of the training stage SAD system based on support vector machine (SVM) classifier presented, and its performance tested with RT05 RT06 evaluation tasks. A fast algorithm data reduction proximal SVM has been developed and, furthermore, specific characteristics metric used NIST have taken into account during training. Tested data, resulting shown better scores than best...

10.1109/icassp.2007.367247 article EN 2007-04-01

Acoustic events produced in meeting-room-like environments may carry information useful for perceptually aware interfaces. We focus on the problem of classifying 16 types acoustic events, using and comparing several features various classifiers based either GMM or SVM. A variable-feature-set clustering scheme is developed compared with an already reported binary tree scheme. In our experiments event-level features, proposed SVM achieves a 31.5% relative error reduction respect to best result from

10.1109/icassp.2005.1416351 article EN 2006-10-04

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on information shows a large amount errors, which are mostly due overlaps. Actually, overlaps accounted for more than 70% errors real-world interactive seminar recordings used CLEAR 2007 evaluations. In this paper, we improve recognition rate events using from both video modalities. First, data...

10.1155/2011/485738 article EN cc-by EURASIP Journal on Advances in Signal Processing 2011-02-13

Acoustic source localization and sound recognition are common acoustic scene analysis tasks that usually considered separately. In this paper, a new technique is proposed works jointly with an event detection system. Given the identities end-points of simultaneous sounds, uses statistical models those sounds to compute likelihood score for each model signal at output set null-steering beamformers per microphone array. Those scores subsequently combined find MAP-optimal positions in room....

10.1109/icassp.2014.6853670 article EN 2014-05-01

Recently, the advantages of spectral parameters obtained by frequency filtering (FF) logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are derivatives logFBEs, lie in domain, and shown good recognition performance with respect to conventional mel-frequency cepstral coefficients (MFCCs) for hidden Markov models (HMM) based systems. In this paper, FF features first compared MFCCs relative perceptual linear prediction (Rasta-PLP) using both a hybrid HMM/MLP...

10.1109/tsa.2004.834466 article EN IEEE Transactions on Speech and Audio Processing 2004-12-20

When performing speaker diarization, it is common practice to use an agglomerative clustering approach where the acoustic data first split in small segments and then pairs of these are merged until a particular stopping point reached. The diarization performance can be greatly improved by speech/non-speech detector. detector helps system preventing non-speech frames from "confusing" both merging processes. Over years there has been extensive research on detectors. Often times, detectors...

10.1109/odyssey.2006.248109 article EN 2006-06-01
Coming Soon ...