Image-captioning model compression

Atliha, Viktar; Šešok, Dmitrij

dc.contributor.author	Atliha, Viktar
dc.contributor.author	Šešok, Dmitrij
dc.date.accessioned	2023-09-18T16:16:47Z
dc.date.available	2023-09-18T16:16:47Z
dc.date.issued	2022
dc.identifier.issn	2076-3417
dc.identifier.uri	https://etalpykla.vilniustech.lt/handle/123456789/112616
dc.description.abstract	Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder–decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.	eng
dc.format	PDF
dc.format.extent	p. 1-14
dc.format.medium	tekstas / txt
dc.language.iso	eng
dc.relation.isreferencedby	Science Citation Index Expanded (Web of Science)
dc.relation.isreferencedby	Scopus
dc.relation.isreferencedby	DOAJ
dc.relation.isreferencedby	INSPEC
dc.relation.isreferencedby	J-Gate
dc.relation.isreferencedby	Gale's Academic OneFile
dc.source.uri	https://www.mdpi.com/2076-3417/12/3/1638/pdf
dc.title	Image-captioning model compression
dc.type	Straipsnis Web of Science DB / Article in Web of Science DB
dcterms.accessRights	This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/)
dcterms.license	Creative Commons – Attribution – 4.0 International
dcterms.references	61
dc.type.pubtype	S1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institution	Vilniaus Gedimino technikos universitetas
dc.contributor.faculty	Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfield	T 007 - Informatikos inžinerija / Informatics engineering
dc.subject.researchfield	N 009 - Informatika / Computer science
dc.subject.studydirection	B04 - Informatikos inžinerija / Informatics engineering
dc.subject.vgtuprioritizedfields	IK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializations	L106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.en	image captioning
dc.subject.en	model compression
dc.subject.en	pruning
dc.subject.en	quantization
dcterms.sourcetitle	Applied sciences
dc.description.issue	iss. 3
dc.description.volume	vol. 12
dc.publisher.name	MDPI
dc.publisher.city	Basel
dc.identifier.doi	000756109800001
dc.identifier.doi	10.3390/app12031638
dc.identifier.elaba	118832017

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Straipsniai Web of Science ir/ar Scopus referuojamuose leidiniuose / Articles in Web of Science and/or Scopus indexed sources [7946]

Show simple item record