| dc.contributor.author | Atliha, Viktar | |
| dc.contributor.author | Šešok, Dmitrij | |
| dc.date.accessioned | 2023-09-18T20:29:20Z | |
| dc.date.available | 2023-09-18T20:29:20Z | |
| dc.date.issued | 2020 | |
| dc.identifier.uri | https://etalpykla.vilniustech.lt/handle/123456789/150327 | |
| dc.description.abstract | Recent models for image captioning are usually based on an encoder-decoder framework. Large pre-trained convolutional neural networks are often used as encoders. However, different authors use different encoder architectures for their image captioning models. This makes it more difficult to determine the effect that the encoder has on the overall model performance. In this paper we compare two popular convolution networks architectures – VGG and ResNet – as encoders for the same image captioning model in order to find out which method is the best at image representation used for caption generation.The results show that the ResNet outperforms VGG allowing image captioning model achieve higher BLEU-4 score. Furthermore, the results show that the ResNet allows model to achieve a score comparable with the VGG-based model with a less amount of training epochs. Based on this data we can state that encoder plays a big role and can significantly improve model without changing a decoder architecture. | eng |
| dc.format | PDF | |
| dc.format.extent | p. 1-4 | |
| dc.format.medium | tekstas / txt | |
| dc.language.iso | eng | |
| dc.relation.isreferencedby | Scopus | |
| dc.relation.isreferencedby | IEEE Xplore | |
| dc.source.uri | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9108880 | |
| dc.title | Comparison of VGG and ResNet used as encoders for image captioning | |
| dc.type | Straipsnis konferencijos darbų leidinyje Scopus DB / Paper in conference publication in Scopus DB | |
| dcterms.references | 25 | |
| dc.type.pubtype | P1b - Straipsnis konferencijos darbų leidinyje Scopus DB / Article in conference proceedings Scopus DB | |
| dc.contributor.institution | Vilniaus Gedimino technikos universitetas | |
| dc.contributor.faculty | Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences | |
| dc.subject.researchfield | T 007 - Informatikos inžinerija / Informatics engineering | |
| dc.subject.vgtuprioritizedfields | IK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems | |
| dc.subject.ltspecializations | L106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies | |
| dc.subject.en | image captioning | |
| dc.subject.en | encoder-decoder framework | |
| dc.subject.en | convolutional neural networks | |
| dc.subject.en | VGG | |
| dc.subject.en | ResNet | |
| dcterms.sourcetitle | 2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), 30 April 2020, Vilnius, Lithuania: proceedings of the conference / organized by Vilnius Gediminas Technical University | |
| dc.publisher.name | IEEE | |
| dc.publisher.city | New York | |
| dc.identifier.doi | 10.1109/eStream50540.2020.9108880 | |
| dc.identifier.elaba | 62631491 | |