Investigation of Acoustic Features for Voice Activation Problem

Kolesau, Aliaksei; Šešok, Dmitrij

Date

2020

Author

Kolesau, Aliaksei

Šešok, Dmitrij

Metadata

Show full item record

Abstract

In this paper we examine the results of using different acoustic feature computation pipelines for classifying audio keywords with a convolutional neural network (CNN). We compare the use of Mel-frequency cepstral coefficients (MFCCs) and a simple filterbank averaging technique. Also we examined the influence of MFCCs computation parameters on the resulting quality. The results show that CNNs benifit from using prior knowledge in acoustic feature computation. In our experiments we got 30% drop in accuracy while switching from MFCC to filterbank averaging. Furthemore, the default values of MFCCs parameters that are used in many libraries might not be the best for voice activation problem: frame length of 55 ms showed better results than default length of 20 ms.

Issue date (year)

2020

Author

Kolesau, Aliaksei

URI

https://etalpykla.vilniustech.lt/handle/123456789/159552

Collections

2020 International Conference "Electrical, Electronic and Information Sciences“ (eStream) [24]