AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
DOI:
10.48550/arxiv.2401.10411
Publication Date:
2024-01-01
AUTHORS (7)
ABSTRACT
Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) that have microphone arrays, which fuses multi-channel ASR with serialized output training, wearer/conversation-partner disambiguation as well suppression of cross-talk speech from non-target directions and noise. When work is part a broader system-development process, one may be faced changes geometries system development progresses. This paper aims make insensitive limited variations microphone-array geometry. show model trained multiple similar largely agnostic generalizes new geometries, long they not too different. Furthermore, training this way improves accuracy seen by 15 28\% relative. Lastly, we refine beamforming novel Non-Linearly Constrained Minimum Variance criterion.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....