dc.contributor.author | Atliha, Viktar | |
dc.contributor.author | Šešok, Dmitrij | |
dc.date.accessioned | 2023-09-18T20:30:28Z | |
dc.date.available | 2023-09-18T20:30:28Z | |
dc.date.issued | 2020 | |
dc.identifier.issn | 2076-3417 | |
dc.identifier.uri | https://etalpykla.vilniustech.lt/handle/123456789/150539 | |
dc.description.abstract | Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation. | eng |
dc.format | PDF | |
dc.format.extent | p. 1-11 | |
dc.format.medium | tekstas / txt | |
dc.language.iso | eng | |
dc.relation.isreferencedby | AGORA | |
dc.relation.isreferencedby | Chemical abstracts | |
dc.relation.isreferencedby | Genamics Journal Seek | |
dc.relation.isreferencedby | DOAJ | |
dc.relation.isreferencedby | INSPEC | |
dc.relation.isreferencedby | Scopus | |
dc.relation.isreferencedby | Science Citation Index Expanded (Web of Science) | |
dc.source.uri | https://www.mdpi.com/2076-3417/10/17/5978 | |
dc.source.uri | https://doi.org/10.3390/app10175978 | |
dc.title | Text augmentation using BERT for image captioning | |
dc.type | Straipsnis Web of Science DB / Article in Web of Science DB | |
dcterms.accessRights | This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). | |
dcterms.license | Creative Commons – Attribution – 4.0 International | |
dcterms.references | 48 | |
dc.type.pubtype | S1 - Straipsnis Web of Science DB / Web of Science DB article | |
dc.contributor.institution | Vilniaus Gedimino technikos universitetas | |
dc.contributor.faculty | Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences | |
dc.subject.researchfield | T 007 - Informatikos inžinerija / Informatics engineering | |
dc.subject.vgtuprioritizedfields | IK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems | |
dc.subject.ltspecializations | L106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies | |
dc.subject.en | image captioning | |
dc.subject.en | augmentation | |
dc.subject.en | BERT | |
dcterms.sourcetitle | Applied sciences | |
dc.description.issue | iss. 17 | |
dc.description.volume | vol. 10 | |
dc.publisher.name | MDPI | |
dc.publisher.city | Basel | |
dc.identifier.doi | 000570347700001 | |
dc.identifier.doi | 10.3390/app10175978 | |
dc.identifier.elaba | 68046320 | |