Show simple item record

dc.contributor.authorKolesau, Aliaksei
dc.contributor.authorŠešok, Dmitrij
dc.date.accessioned2023-09-18T20:34:41Z
dc.date.available2023-09-18T20:34:41Z
dc.date.issued2020
dc.identifier.issn2076-3417
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/151022
dc.description.abstractThe problem of voice activation is to find a pre-defined word in the audio stream. Solutions such as keyword spotter “Ok, Google” for Android devices or keyword spotter “Alexa” for Amazon devices use tens of thousands to millions of keyword examples in training. In this paper, we explore the possibility of using pre-trained audio features to build voice activation with a small number of keyword examples. The contribution of this article consists of two parts. First, we investigate the dependence of the quality of the voice activation system on the number of examples in training for English and Russian and show that the use of pre-trained audio features, such as wav2vec, increases the accuracy of the system by up to 10% if only seven examples are available for each keyword during training. At the same time, the benefits of such features become less and disappear as the dataset size increases. Secondly, we prepare and provide for general use a dataset for training and testing voice activation for the Lithuanian language. We also provide training results on this dataset.eng
dc.formatPDF
dc.format.extentp. 1-13
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.relation.isreferencedbyScience Citation Index Expanded (Web of Science)
dc.relation.isreferencedbyScopus
dc.relation.isreferencedbyINSPEC
dc.relation.isreferencedbyDOAJ
dc.relation.isreferencedbyChemical abstracts
dc.relation.isreferencedbyGenamics Journal Seek
dc.source.urihttps://www.mdpi.com/2076-3417/10/23/8643
dc.titleUnsupervised pre-training for voice activation
dc.typeStraipsnis Web of Science DB / Article in Web of Science DB
dcterms.accessRightsThis article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
dcterms.licenseCreative Commons – Attribution – 4.0 International
dcterms.references39
dc.type.pubtypeS1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.studydirectionB04 - Informatikos inžinerija / Informatics engineering
dc.subject.vgtuprioritizedfieldsIK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializationsL106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.envoice activation
dc.subject.enunsupervised learning
dc.subject.enpre-training
dc.subject.enkeyword spotter
dc.subject.enneural network
dc.subject.enResNet
dcterms.sourcetitleApplied sciences: Section computing and artificial intelligence
dc.description.issueiss. 23
dc.description.volumevol. 10
dc.publisher.nameMDPI
dc.publisher.cityBasel
dc.identifier.doi000597108900001
dc.identifier.doi10.3390/app10238643
dc.identifier.elaba76584107


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record