Receptive field in neural network keyword spotting models

Kolesau, Aliaksei

Date

2019

Author

Kolesau, Aliaksei

Metadata

Show full item record

Abstract

Many keyword spotting models use neural networks to detect acoustic events such as phonemes, word pieces or whole words. The model is inferenced on every frame (segmented piece of audio) which is typically every 10ms. In order to improve the quality of classification neural network uses audio features for both the frame under classification and several adjacent frames. This introduces a tradeoff. Too large receptive field might cause overfitting, increases the number of parameters and latency. Too small receptive field might not be able to provide enough information to correctly classify audio event. We investigate several policies of constructing receptive field for neural network in keyword spotting including the ways to make receptive field more sparse such as frame skipping and frame stacking.

Issue date (year)

2019

URI

https://etalpykla.vilniustech.lt/handle/123456789/151025

Collections

Konferencijų pranešimų santraukos / Conference and Meeting Abstracts [3431]