- Speech and Audio Processing
- Music and Audio Processing
- Speech Recognition and Synthesis
- Advanced Adaptive Filtering Techniques
- Indoor and Outdoor Localization Technologies
- Acoustic Wave Phenomena Research
- Corporate Finance and Governance
- Flow Measurement and Analysis
- Hearing Loss and Rehabilitation
- Underwater Acoustics Research
- Music Technology and Sound Studies
- Natural Language Processing Techniques
- Speech and dialogue systems
- Topic Modeling
- Auditing, Earnings Management, Governance
- QR Code Applications and Technologies
- Corruption and Economic Development
- International Business and FDI
- Mobile Crowdsensing and Crowdsourcing
- Direction-of-Arrival Estimation Techniques
- Accounting Theory and Financial Reporting
- Global Financial Crisis and Policies
- IoT and Edge/Fog Computing
Amazon (United States)
2022-2023
Amazon (Germany)
2022
Foundation for Research and Technology Hellas
2013-2018
University of Crete
2011-2018
FORTH Institute of Computer Science
2015-2017
FORTH Institute of Electronic Structure and Laser
2014-2017
Wireless acoustic sensor networks (WASNs) are formed by a distributed group of acoustic-sensing devices featuring audio playing and recording capabilities. Current mobile computing platforms offer great possibilities for the design audio-related applications involving nodes. In this context, source localization is one application domains that have attracted most attention research community along last decades. general terms, sources can be achieved studying energy temporal and/or directional...
In this paper, we consider the data-association problem for localization of multiple sound sources in a wireless acoustic sensor network, where each node is microphone array, using direction arrival (DOA) estimates. The arises because central that receives DOA estimates from nodes cannot know to which source they belong. Hence, DOAs different correspond same must be found order perform accurate localization. We present method identify correct association and thus accurately estimate their...
This paper proposes a real-time method for capturing and reproducing spatial audio based on circular microphone array. Following different approach than other recently proposed array-based methods audio, the estimates directions of arrival active sound sources per time-frame basis performs source separation with fixed superdirective beamformer, which results in more accurate modelling reproduction recorded acoustic environment. The separated signals are downmixed into one monophonic signal,...
In this work, we consider the multiple sound source location estimation and counting problem in a wireless acoustic sensor network, where each consists of microphone array. Our method is based on inferring estimate for frequency captured signals. A clustering approach-where number clusters (i.e., sources) also an unknown parameter-is then employed to decide sources their locations. The efficiency our proposed evaluated through simulations real recordings scenarios with up three simultaneous...
Neural contextual biasing for end-to-end neural ASR transducers has shown significant improvements in the recognition of named entities, such as contact names or device names. However, it comes with cost increased compute, layers (which are usually based on cross-attention) add complexity to transducers. In this paper, we propose gated models that can estimate at runtime when is needed and toggle off. That way, does not run every audio frame, but only frames where be helpful correct...
In this work we propose a grid-based method to estimate the location of multiple sources in wireless acoustic sensor network, where each node contains microphone array and only transmits direction-of-arrival (DOA) estimates time interval, minimizing transmissions central processing node. We present new on modeling DOA estimation error such scenario. Through extensive, realistic simulations, show our outperforms other state-of-the-art methods, both accuracy complexity. localization results...
In this paper, we consider the data association problem that arises when localizing multiple sound sources using direction of arrival (DOA) estimates from microphone arrays. such a scenario, DOAs across arrays correspond to same source is unknown and must be found for accurate localization. We present an algorithm finds correct DOA based on features extracted each propose. Our method results in high localization accuracy scenarios with missed detections, reverberation, noise outperforms...
We present the design of a digital microphone array comprised MEMS microphones and evaluate its potential for spatial audio capturing direction-of-arrival (DOA) estimation which is an essential part encoding soundscape. The device cheaper more compact alternative to analog arrays require external - usually expensive analog-to-digital converters sound cards. However, performance such DOA acquisition has not been investigated. In this work, efficiency evaluated compared typical same geometry....
We propose a real-time method for coding an acoustic environment based on estimating the Direction-of-Arrival (DOA) and reproducing it using arbitrary loudspeaker configuration or headphones. encode sound field with use of one audio signal side-information. The can be further encoded MP3 encoder to reduce bitrate. investigate how such affect spatial impression quality reproduction. Also, we lossless efficient compression scheme Our is compared other recently proposed microphone array methods...
We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech tasks. This enables a dynamic switch for its runtime compute paths by exploiting WW spotting select which branch of attention networks execute input audio frame. With this approach, we effectively accuracy while saving cost as defined floating point operations (FLOPs). Using in-house de-identified dataset, demonstrate that the proposed network...
Speaker localization and counting in real-life conditions remains a challenging task. The computational burden, transmission usage synchronization issues pose several limitations. Moreover, the physical characteristics of real speakers terms directivity pattern orientation, as well restrictions microphone array positioning, which commonly have to be placed close walls, deteriorate performance. In this paper, we propose method that accounts for adjacent wall reflections evaluate it using...
The Forthroid is a location-based system that "augments" physical objects with multimedia information and enables users to receive about or request services related objects. It employs computer-vision techniques Quick Response codes (QR-codes). We have implemented prototype on Android platforms evaluated its performance systems metrics subjective tests. discuss our findings challenges in prototyping OS. analysis indicates the network server are main sources of delay, while CPU load may vary...
To achieve robust far-field automatic speech recognition (ASR), existing techniques typically employ an acoustic front end (AFE) cascaded with a neural transducer (NT) ASR model. The AFE output, however, could be unreliable, as the beamforming output in is steered to wrong direction. A promising way address this issue exploit microphone signals before stage and after echo cancellation (post-AEC) AFE. We argue that both, post-AEC outputs, are complementary it possible leverage redundancy...
Recently, wireless acoustic sensor networks (WASNs) have received significant attention from the research community and a variety of methods been proposed for numerous applications, such as location estimation speech enhancement. The lack publicly available datasets with signals recorded in WASNs, presents difficulties obtaining consistent performance indicators across different approaches. In this paper, we present release dataset real an outdoor WASN comprised four microphone arrays. Our...
We introduce Caching Networks (CachingNets), a speech recognition network architecture capable of delivering faster, more accurate decoding by leveraging common patterns. By explicitly incorporating select sentences unique to each user into the network's design, we show how train model as an extension popular sequence transducer through multitask learning procedure. further propose and experiment with different phrase caching policies, which are effective for virtual voice-assistant (VA)...
On-device spoken language understanding (SLU) offers the potential for significant latency savings compared to cloud-based processing, as audio stream does not need be transmitted a server. We present Tiny Signal-to-interpretation (TinyS2I), an end-to-end on-device SLU approach which is focused on heavily resource constrained devices. TinyS2I brings reduction without accuracy degradation, by exploiting use cases when distribution of utterances that users speak device largely heavy-tailed....
Narrowband direction-of-arrival (DOA) estimates for each time-frequency (TF) point offer a parametric spatial modeling of the acoustic environment which is very commonly used in many applications, such as source separation, dereverberation, and audio. However, irrespective narrowband DOA estimation method used, TF-points suffer from erroneous due to noise reverberation. We propose novel technique yield more accurate TF-domain, through statistical TF-point with complex Watson distribution....