CNN-Based Processing of Acoustic and Radio Frequency Signals for Speaker Localization from MAVs
Direction of arrival and distance regression; Multi-stage convolutional neural network; RF fusion; RF-assisted multi-channel speech; Speaker localization; Speech
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
DOI:
10.21437/interspeech.2021-886
Publication Date:
2021-08-27T05:59:39Z
AUTHORS (4)
ABSTRACT
A novel speaker localization algorithm from micro aerial vehicles (MAVs) is investigated. It introduces a joint direction of arrival (DOA) and distance prediction method based on processing and fusion of the multi-channel speech data with radio frequency (RF) measurements of the received signal strength. Possible applications include unmanned aerial vehicles (UAVs)based reconnaissance and surveillance against intrusions and search and rescue in hostile environments. A 3-stages convolutional neural network (CNN) with a fusion layer is proposed to perform this task with the objective of augmenting the source localization from multi-channel speech signals. Two parallel CNNs process the speech and RF data, and the regression network produces predictions of the angle and distance from the source after the fusion layer. To show the performance and effectiveness of this RF-assisted method, the experimental scenario and datasets are presented and experiments are then discussed along with the results that have been obtained.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (3)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....