- Speech and Audio Processing
- Mobile Crowdsensing and Crowdsourcing
- Image and Video Quality Assessment
- Hearing Loss and Rehabilitation
- Advanced Adaptive Filtering Techniques
- Speech Recognition and Synthesis
- Music and Audio Processing
- Open Source Software Innovations
- Indoor and Outdoor Localization Technologies
- Technology Adoption and User Behaviour
- Video Coding and Compression Technologies
- Visual Attention and Saliency Detection
- Advanced Image Processing Techniques
- Natural Language Processing Techniques
- Advanced Data Compression Techniques
- Text Readability and Simplification
- Speech and dialogue systems
- Virtual Reality Applications and Impacts
- Forecasting Techniques and Applications
- Advanced Computing and Algorithms
- Impact of Technology on Adolescents
- Power Line Communications and Noise
- Interactive and Immersive Displays
- Evacuation and Crowd Dynamics
- Image and Signal Denoising Methods
Microsoft (United States)
2024-2025
Microsoft (Finland)
2023-2024
Technische Universität Berlin
2012-2022
Isfahan University of Medical Sciences
2012-2022
University of Mohaghegh Ardabili
2021
Deutsche Telekom (Germany)
2012-2015
Razi University
2013-2014
In this paper, we present an update to the NISQA speech quality prediction model that is focused on distortions occur in communication networks. contrast previous version, trained end-to-end and time-dependency modelling time-pooling achieved through a Self-Attention mechanism. Besides overall quality, also predicts four dimensions Noisiness, Coloration, Discontinuity, Loudness, way gives more insight into cause of degradation. Furthermore, new datasets with over 13,000 files were created...
The ITU-T Recommendation P.808 provides a crowdsourcing approach for conducting subjective assessment of speech quality using the Absolute Category Rating (ACR) method. We provide an open-source implementation Rec. that runs on Amazon Mechanical Turk platform. extended our to include Degradation Ratings (DCR) and Comparison (CCR) test methods. also significantly speed up process by integrating participant qualification step into main rating task compared two-stage solution. program scripts...
The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of DNS challenge series. challenges were organized from 2019 to foster research in field DNS. Previous held at INTERSPEECH 2020, 2021, and 2022. This aims advance models capable jointly addressing denoising, dereverberation, interfering talker suppression, with separate tracks focusing on headset speakerphone scenarios. facilitates personalized deep noise suppression by providing accompanying enrollment clips for...
The ICASSP 2023 Speech Signal Improvement Challenge is intended to stimulate research in the area of improving speech signal quality communication systems. can be measured with SIG ITU-T P.835 and still a top issue audio conferencing For example, 2022 Deep Noise Suppression challenge, improvement background overall impressive, but not statistically significant. To improve following impairment areas must addressed: coloration, discontinuity, loudness, reverberation, noise. A training test set...
Subjective speech quality assessment is the gold standard for evaluating enhancement processing and telecommunication systems. The commonly used ITU-T Rec. P.800 defines how to measure in lab environments, P.808 extended it crowdsourcing. P.835 extends of presence noise. P.804 targets conversation test introduces perceptual dimensions which are measured during listening phase conversation. noisiness, coloration, discontinuity, loudness. We create a crowd-sourcing implementation...
With the coming of age virtual/augmented reality and interactive media, numerous definitions, frameworks, models immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there noticeable interdisciplinary differences regarding scope, constituents that required be addressed so a coherent understanding can achieved. Such consensus vital for paving...
Cloud Gaming (CG) is an immersive multimedia service that promises many benefits. In CG, the games are rendered in a cloud server, and resulted scenes streamed as video sequence to client. Using CG users not forced update their gaming hardware frequently, available can be played on any operating system or suitable device. However, requires reliable low-latency network, which makes it very challenging service. Transmission latency strongly affects playability of game consequently reduces...
With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during meetings, quality be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because its nature, is traditionally assessed subjective tests laboratories and lately also crowdsourcing following international standards from ITU-T Rec. P.800 series. those approaches are costly cannot...
We propose an open-source extension of the ITU-T Rec. P.910 subjective video quality test based on crowdsourcing principles. This addresses speed, usage cost, and barrier to issues P.910. implement Absolute Category Rating (ACR), ACR with hidden reference (ACRHR), Degradation (DCR), Comparison (CCR), include rater, environment, hardware, network qualifications, as well gold trapping questions ensure quality. have validated that implementation is both accurate highly reproducible.
The quality of the speech communication systems, which include noise suppression algorithms, are typically evaluated in laboratory experiments according to ITU-T Rec.P.835, participants rate background noise, signal, and overall separately.This paper introduces an open-source toolkit for conducting subjective evaluation suppressed crowdsourcing.We followed P.808 highly automate process prevent moderator's error.To assess validity our method, we compared Mean Opinion Scores (MOS), calculate...
Recently, a new authentication method based on 3D signatures created in air is proposed for mobile devices [4]. The signature using properly shaped magnet (a rod or ring) taken hand. It influencing compass sensor embedded the generation of devices. In this paper, we present implementation technology device (iPhone 3GS). can demonstrate process gesture from freely space around by held Movement produces temporal change magnetic field sensed sensor, and be used as basis authentication. As are...
Abstract Subjective speech quality assessment has traditionally been carried out in laboratory environments under controlled conditions. With the advent of crowdsourcing platforms tasks, which need human intelligence, can be resolved by crowd workers over Internet. Crowdsourcing also offers a new paradigm for assessment, promising higher ecological validity judgments at expense potentially lower reliability. This paper compares laboratory-based and crowdsourcing-based assessments terms...
The rank correlation coefficients and the ranked-based statistical tests (as a subset of non-parametric techniques) might be misleading when they are applied to subjectively collected opinion scores. Those techniques assume that data is measured at least an ordinal level define sequence scores represent tied have precisely equal numeric value. In this paper, we show definition rank, as mentioned above, not suitable for Mean Opinion Scores (MOS) conclusions rank-based techniques. Furthermore,...
The quality of acoustic echo cancellers (AECs) in real-time communication systems is typically evaluated using objective metrics like ERLE [1] and PESQ [2], less commonly with lab-based subjective tests ITU-T Rec. P.831 [3]. We will show that these measures are not well correlated to measures. then introduce an open-source crowdsourcing approach for evaluation impairment which can be used evaluate the performance AECs. provide a study shows this tool highly reproducible. This new has been...
Commonly used datasets for evaluating video codecs are all very high quality and not representative of typically in conferencing scenarios. We present the Video Conferencing Dataset (VCD) real-time communication, first such dataset focused on conferencing. VCD includes a wide variety camera qualities spatial temporal information. It both desktop mobile scenarios two types background processing. report compression efficiency H.264, H.265, H.266, AV1 low-delay settings compare it with...
In this paper, the reliability of responses collected in two crowdsourcing studies is compared. Two methods to evaluate (one noticeable and one unnoticeable method for workers) have been employed. The included both studies; employed study only. containing check resulted a higher consistency than other one. We assume that difference result obvious method: Workers improve their performance due awareness they are being observed.