dc.contributor.author | Kolesau, Aliaksei | |
dc.contributor.author | Šešok, Dmitrij | |
dc.date.accessioned | 2023-09-18T16:08:01Z | |
dc.date.available | 2023-09-18T16:08:01Z | |
dc.date.issued | 2021 | |
dc.identifier.issn | 2076-3417 | |
dc.identifier.uri | https://etalpykla.vilniustech.lt/handle/123456789/111535 | |
dc.description.abstract | Voice activation systems are used to find a pre-defined word or phrase in the audio stream. Industry solutions, such as “OK, Google” for Android devices, are trained with millions of samples. In this work, we propose and investigate several ways to train a voice activation system when the in-domain data set is small. We compare self-training exemplar pre-training, fine-tuning a model pre-trained on another domain, joint training on both an out-of-domain high-resource and a target low-resource data set, and unsupervised pre-training. In our experiments, the unsupervised pre-training and the joint-training with a high-resource data set from another domain significantly outperform a strong baseline of fine-tuning a model trained on another data set. We obtain 7–25% relative improvement depending on the model architecture. Additionally, we improve the best test accuracy on the Lithuanian data set from 90.77% to 93.85%. | eng |
dc.format | PDF | |
dc.format.extent | p. 1-16 | |
dc.format.medium | tekstas / txt | |
dc.language.iso | eng | |
dc.relation.isreferencedby | Science Citation Index Expanded (Web of Science) | |
dc.relation.isreferencedby | Scopus | |
dc.source.uri | https://www.mdpi.com/2076-3417/11/14/6298 | |
dc.title | Voice activation for low-resource languages | |
dc.type | Straipsnis Web of Science DB / Article in Web of Science DB | |
dcterms.accessRights | This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/) | |
dcterms.license | Creative Commons – Attribution – 4.0 International | |
dcterms.references | 48 | |
dc.type.pubtype | S1 - Straipsnis Web of Science DB / Web of Science DB article | |
dc.contributor.institution | Vilniaus Gedimino technikos universitetas | |
dc.contributor.faculty | Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences | |
dc.subject.researchfield | T 007 - Informatikos inžinerija / Informatics engineering | |
dc.subject.researchfield | N 009 - Informatika / Computer science | |
dc.subject.studydirection | B04 - Informatikos inžinerija / Informatics engineering | |
dc.subject.vgtuprioritizedfields | IK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems | |
dc.subject.ltspecializations | L106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies | |
dc.subject.en | voice activation | |
dc.subject.en | low-resource languages | |
dc.subject.en | unsupervised learning | |
dc.subject.en | pre-training | |
dc.subject.en | keyword spotter | |
dc.subject.en | neural network | |
dc.subject.en | ResNet | |
dcterms.sourcetitle | Applied sciences | |
dc.description.issue | iss. 14 | |
dc.description.volume | vol. 11 | |
dc.publisher.name | MDPI | |
dc.publisher.city | Basel | |
dc.identifier.doi | 000675961100001 | |
dc.identifier.doi | 10.3390/app11146298 | |
dc.identifier.elaba | 100087049 | |