Comparison of VGG and ResNet used as Encoders for Image Captioning

Atliha, Viktar; Šešok, Dmitrij

dc.rights.license	Visos teisės saugomos / All rights reserved	en_US
dc.contributor.author	Atliha, Viktar
dc.contributor.author	Šešok, Dmitrij
dc.date.accessioned	2025-12-15T11:12:51Z
dc.date.available	2025-12-15T11:12:51Z
dc.date.issued	2020
dc.identifier.isbn	9781728197807	en_US
dc.identifier.uri	https://etalpykla.vilniustech.lt/handle/123456789/159547
dc.description.abstract	Recent models for image captioning are usually based on an encoder-decoder framework. Large pre-trained convolutional neural networks are often used as encoders. However, different authors use different encoder architectures for their image captioning models. This makes it more difficult to determine the effect that the encoder has on the overall model performance. In this paper we compare two popular convolution networks architectures – VGG and ResNet – as encoders for the same image captioning model in order to find out which method is the best at image representation used for caption generation. The results show that the ResNet outperforms VGG allowing image captioning model achieve higher BLEU-4 score. Furthermore, the results show that the ResNet allows model to achieve a score comparable with the VGG-based model with a less amount of training epochs. Based on this data we can state that encoder plays a big role and can significantly improve model without changing a decoder architecture.	en_US
dc.format.extent	4 p.	en_US
dc.format.medium	Tekstas / Text	en_US
dc.language.iso	en	en_US
dc.relation.uri	https://etalpykla.vilniustech.lt/handle/123456789/159395	en_US
dc.source.uri	https://ieeexplore.ieee.org/document/9108880	en_US
dc.subject	image captioning	en_US
dc.subject	encoder-decoder framework	en_US
dc.subject	convolutional neural networks	en_US
dc.subject	VGG	en_US
dc.subject	ResNet	en_US
dc.title	Comparison of VGG and ResNet used as Encoders for Image Captioning	en_US
dc.type	Konferencijos publikacija / Conference paper	en_US
dcterms.accrualMethod	Rankinis pateikimas / Manual submission	en_US
dcterms.issued	2020-06-05
dcterms.references	25	en_US
dc.description.version	Taip / Yes	en_US
dc.contributor.institution	Vilniaus Gedimino technikos universitetas	en_US
dc.contributor.institution	Vilnius Gediminas Technical University	en_US
dc.contributor.department	Informacinių technologijų katedra / Department of Information Technologies	en_US
dcterms.sourcetitle	2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), April 30, 2020, Vilnius, Lithuania	en_US
dc.identifier.eisbn	9781728197791	en_US
dc.publisher.name	IEEE	en_US
dc.publisher.country	United States of America	en_US
dc.publisher.city	New York	en_US
dc.identifier.doi	https://doi.org/10.1109/eStream50540.2020.9108880	en_US

Šio įrašo failai

Failai	Dydis	Formatas	Peržiūra
Su šiuo įrašu susijusių failų nėra.

Šis įrašas yra šioje (-se) kolekcijoje (-ose)

2020 International Conference "Electrical, Electronic and Information Sciences“ (eStream) [24]

Rodyti trumpą aprašą