Show simple item record

dc.contributor.authorAtliha, Viktar
dc.contributor.authorŠešok, Dmitrij
dc.date.accessioned2023-09-18T20:29:20Z
dc.date.available2023-09-18T20:29:20Z
dc.date.issued2020
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/150327
dc.description.abstractRecent models for image captioning are usually based on an encoder-decoder framework. Large pre-trained convolutional neural networks are often used as encoders. However, different authors use different encoder architectures for their image captioning models. This makes it more difficult to determine the effect that the encoder has on the overall model performance. In this paper we compare two popular convolution networks architectures – VGG and ResNet – as encoders for the same image captioning model in order to find out which method is the best at image representation used for caption generation.The results show that the ResNet outperforms VGG allowing image captioning model achieve higher BLEU-4 score. Furthermore, the results show that the ResNet allows model to achieve a score comparable with the VGG-based model with a less amount of training epochs. Based on this data we can state that encoder plays a big role and can significantly improve model without changing a decoder architecture.eng
dc.formatPDF
dc.format.extentp. 1-4
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.relation.isreferencedbyScopus
dc.relation.isreferencedbyIEEE Xplore
dc.source.urihttps://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9108880
dc.titleComparison of VGG and ResNet used as encoders for image captioning
dc.typeStraipsnis konferencijos darbų leidinyje Scopus DB / Paper in conference publication in Scopus DB
dcterms.references25
dc.type.pubtypeP1b - Straipsnis konferencijos darbų leidinyje Scopus DB / Article in conference proceedings Scopus DB
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.vgtuprioritizedfieldsIK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializationsL106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.enimage captioning
dc.subject.enencoder-decoder framework
dc.subject.enconvolutional neural networks
dc.subject.enVGG
dc.subject.enResNet
dcterms.sourcetitle2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), 30 April 2020, Vilnius, Lithuania: proceedings of the conference / organized by Vilnius Gediminas Technical University
dc.publisher.nameIEEE
dc.publisher.cityNew York
dc.identifier.doi10.1109/eStream50540.2020.9108880
dc.identifier.elaba62631491


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record