Text augmentation using BERT for image captioning

Atliha, Viktar; Šešok, Dmitrij

dc.contributor.author	Atliha, Viktar
dc.contributor.author	Šešok, Dmitrij
dc.date.accessioned	2023-09-18T20:30:28Z
dc.date.available	2023-09-18T20:30:28Z
dc.date.issued	2020
dc.identifier.issn	2076-3417
dc.identifier.uri	https://etalpykla.vilniustech.lt/handle/123456789/150539
dc.description.abstract	Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.	eng
dc.format	PDF
dc.format.extent	p. 1-11
dc.format.medium	tekstas / txt
dc.language.iso	eng
dc.relation.isreferencedby	AGORA
dc.relation.isreferencedby	Chemical abstracts
dc.relation.isreferencedby	Genamics Journal Seek
dc.relation.isreferencedby	DOAJ
dc.relation.isreferencedby	INSPEC
dc.relation.isreferencedby	Scopus
dc.relation.isreferencedby	Science Citation Index Expanded (Web of Science)
dc.source.uri	https://www.mdpi.com/2076-3417/10/17/5978
dc.source.uri	https://doi.org/10.3390/app10175978
dc.title	Text augmentation using BERT for image captioning
dc.type	Straipsnis Web of Science DB / Article in Web of Science DB
dcterms.accessRights	This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
dcterms.license	Creative Commons – Attribution – 4.0 International
dcterms.references	48
dc.type.pubtype	S1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institution	Vilniaus Gedimino technikos universitetas
dc.contributor.faculty	Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfield	T 007 - Informatikos inžinerija / Informatics engineering
dc.subject.vgtuprioritizedfields	IK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializations	L106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.en	image captioning
dc.subject.en	augmentation
dc.subject.en	BERT
dcterms.sourcetitle	Applied sciences
dc.description.issue	iss. 17
dc.description.volume	vol. 10
dc.publisher.name	MDPI
dc.publisher.city	Basel
dc.identifier.doi	000570347700001
dc.identifier.doi	10.3390/app10175978
dc.identifier.elaba	68046320

Files in this item

Name:: Text Augmentation Using BERT for ...
Size:: 1.772Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Straipsniai Web of Science ir/ar Scopus referuojamuose leidiniuose / Articles in Web of Science and/or Scopus indexed sources [7946]

Show simple item record