• Lietuvių
    • English
  • English 
    • Lietuvių
    • English
  • Login
View Item 
  •   DSpace Home
  • Universiteto produkcija / University's production
  • Universiteto leidyba / University's Publishing
  • Konferencijų medžiaga / Conference Materials
  • Tarptautinės konferencijos / International Conferences
  • International Conference "Electrical, Electronic and Information Sciences“ (eStream)
  • 2020 International Conference "Electrical, Electronic and Information Sciences“ (eStream) 
  • View Item
  •   DSpace Home
  • Universiteto produkcija / University's production
  • Universiteto leidyba / University's Publishing
  • Konferencijų medžiaga / Conference Materials
  • Tarptautinės konferencijos / International Conferences
  • International Conference "Electrical, Electronic and Information Sciences“ (eStream)
  • 2020 International Conference "Electrical, Electronic and Information Sciences“ (eStream) 
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Comparison of VGG and ResNet used as Encoders for Image Captioning

Thumbnail
Date
2020
Author
Atliha, Viktar
Šešok, Dmitrij
Metadata
Show full item record
Abstract
Recent models for image captioning are usually based on an encoder-decoder framework. Large pre-trained convolutional neural networks are often used as encoders. However, different authors use different encoder architectures for their image captioning models. This makes it more difficult to determine the effect that the encoder has on the overall model performance. In this paper we compare two popular convolution networks architectures – VGG and ResNet – as encoders for the same image captioning model in order to find out which method is the best at image representation used for caption generation. The results show that the ResNet outperforms VGG allowing image captioning model achieve higher BLEU-4 score. Furthermore, the results show that the ResNet allows model to achieve a score comparable with the VGG-based model with a less amount of training epochs. Based on this data we can state that encoder plays a big role and can significantly improve model without changing a decoder architecture.
Issue date (year)
2020
Author
Atliha, Viktar
URI
https://etalpykla.vilniustech.lt/handle/123456789/159547
Collections
  • 2020 International Conference "Electrical, Electronic and Information Sciences“ (eStream)  [24]

 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specializationThis CollectionBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specialization

My Account

LoginRegister