Show simple item record

dc.contributor.authorAtliha, Viktar
dc.contributor.authorŠešok, Dmitrij
dc.date.accessioned2023-09-18T20:30:28Z
dc.date.available2023-09-18T20:30:28Z
dc.date.issued2020
dc.identifier.issn2076-3417
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/150539
dc.description.abstractImage captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.eng
dc.formatPDF
dc.format.extentp. 1-11
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.relation.isreferencedbyAGORA
dc.relation.isreferencedbyChemical abstracts
dc.relation.isreferencedbyGenamics Journal Seek
dc.relation.isreferencedbyDOAJ
dc.relation.isreferencedbyINSPEC
dc.relation.isreferencedbyScopus
dc.relation.isreferencedbyScience Citation Index Expanded (Web of Science)
dc.source.urihttps://www.mdpi.com/2076-3417/10/17/5978
dc.source.urihttps://doi.org/10.3390/app10175978
dc.titleText augmentation using BERT for image captioning
dc.typeStraipsnis Web of Science DB / Article in Web of Science DB
dcterms.accessRightsThis article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
dcterms.licenseCreative Commons – Attribution – 4.0 International
dcterms.references48
dc.type.pubtypeS1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.vgtuprioritizedfieldsIK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializationsL106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.enimage captioning
dc.subject.enaugmentation
dc.subject.enBERT
dcterms.sourcetitleApplied sciences
dc.description.issueiss. 17
dc.description.volumevol. 10
dc.publisher.nameMDPI
dc.publisher.cityBasel
dc.identifier.doi000570347700001
dc.identifier.doi10.3390/app10175978
dc.identifier.elaba68046320


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record