Multiple sound source localization in three dimensions using convolutional neural networks and clustering based post-processing
Data
2022Autorius
Sakavičius, Saulius
Serackis, Artūras
Abromavičius, Vytautas
Metaduomenys
Rodyti detalų aprašąSantrauka
Sound source localization methods are successfully applied for various estimation tasks, such as tracking and detecting objects, aiming cameras, and navigating robots. However, large and usually complex distributed microphone arrays are used for three-dimensional acoustic source localization. This study proposes a convolutional neural network architecture for three-dimensional sound source localization using a single tetrahedral microphone array. A spectrum phase component of a microphone array signal was designed as the input of the model, while the output represents a three-dimensional space. The paper provides extensive experimental results of the given method on a semi-synthetic audio data set and a real-world microphone array. Furthermore, cluster-based post-processing has been shown to increase the accuracy of three-dimensional localization by more than 30%. The experimental results on a synthesized data set using the image source method showed 1.08 m localization uncertainties. The estimate of the investigated sound sources had a mean absolute error of 18.97° and elevation error of 48.49°. An additional advantage of the proposed method is the ability to predict the location of the sound source from a single signal analysis frame. This gives instant localization and is in line with many alternative applications. The proposed solution does not require intensive preprocessing of the audio signals and can be used as a video camera pointing system based on a microphone array. In the future, it would be relevant to investigate the localization performance of more than two sound sources, and the variable acoustic conditions could also be assessed.