Comparison of VGG and ResNet used as encoders for image captioning

Atliha, Viktar; Šešok, Dmitrij

dc.contributor.author	Atliha, Viktar
dc.contributor.author	Šešok, Dmitrij
dc.date.accessioned	2023-09-18T20:29:20Z
dc.date.available	2023-09-18T20:29:20Z
dc.date.issued	2020
dc.identifier.uri	https://etalpykla.vilniustech.lt/handle/123456789/150327
dc.description.abstract	Recent models for image captioning are usually based on an encoder-decoder framework. Large pre-trained convolutional neural networks are often used as encoders. However, different authors use different encoder architectures for their image captioning models. This makes it more difficult to determine the effect that the encoder has on the overall model performance. In this paper we compare two popular convolution networks architectures – VGG and ResNet – as encoders for the same image captioning model in order to find out which method is the best at image representation used for caption generation.The results show that the ResNet outperforms VGG allowing image captioning model achieve higher BLEU-4 score. Furthermore, the results show that the ResNet allows model to achieve a score comparable with the VGG-based model with a less amount of training epochs. Based on this data we can state that encoder plays a big role and can significantly improve model without changing a decoder architecture.	eng
dc.format	PDF
dc.format.extent	p. 1-4
dc.format.medium	tekstas / txt
dc.language.iso	eng
dc.relation.isreferencedby	Scopus
dc.relation.isreferencedby	IEEE Xplore
dc.source.uri	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9108880
dc.title	Comparison of VGG and ResNet used as encoders for image captioning
dc.type	Straipsnis konferencijos darbų leidinyje Scopus DB / Paper in conference publication in Scopus DB
dcterms.references	25
dc.type.pubtype	P1b - Straipsnis konferencijos darbų leidinyje Scopus DB / Article in conference proceedings Scopus DB
dc.contributor.institution	Vilniaus Gedimino technikos universitetas
dc.contributor.faculty	Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfield	T 007 - Informatikos inžinerija / Informatics engineering
dc.subject.vgtuprioritizedfields	IK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializations	L106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.en	image captioning
dc.subject.en	encoder-decoder framework
dc.subject.en	convolutional neural networks
dc.subject.en	VGG
dc.subject.en	ResNet
dcterms.sourcetitle	2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), 30 April 2020, Vilnius, Lithuania: proceedings of the conference / organized by Vilnius Gediminas Technical University
dc.publisher.name	IEEE
dc.publisher.city	New York
dc.identifier.doi	10.1109/eStream50540.2020.9108880
dc.identifier.elaba	62631491

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Konferencijų straipsniai / Conference Articles [15192]

Show simple item record