Show simple item record

dc.rights.licenseVisos teisės saugomos / All rights reserveden_US
dc.contributor.authorKvietkauskas, Tautvydas
dc.contributor.authorStefanovič, Pavel
dc.date.accessioned2026-01-07T12:33:13Z
dc.date.available2026-01-07T12:33:13Z
dc.date.issued2025
dc.identifier.isbn9798331598747en_US
dc.identifier.issn2831-5634en_US
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/159684
dc.description.abstractThe number of published images, texts, and other information rapidly increases in today’s digital space. The ability to simultaneously process textual and visual information helps to interpret content more accurately. It enables the application of artificial intelligence in complex situations, such as contextual analysis and real-time monitoring of social networks. Today’s Large Language Models (LLMs) are based on text data, Object Recognition (OR) on visual data, and Vision-language (VL) models use text and visual data. The combinations of these different models can be used to create multimodal solutions to solve various context-extraction tasks. In such a way, this requires images and texts, which are input to the multimodal models. This paper systematically reviews the latest research on existing LLM, OR and VL models. The scientific articles were analysed on the Web of Science and Google Scholar databases. All the papers are free to access and date from 2019 to 2024. The main objective was to summarise the results of scientific research, tasks, and methods used in text, image, and image-text data analysis. The types of datasets and the language of the texts used in the research were also reviewed. Additionally, the results are useful for highlighting trends and challenges in the context extraction field that can be useful to other researchers.en_US
dc.format.extent6 p.en_US
dc.format.mediumTekstas / Texten_US
dc.language.isoenen_US
dc.relation.urihttps://etalpykla.vilniustech.lt/handle/123456789/159405en_US
dc.source.urihttps://ieeexplore.ieee.org/document/11016903en_US
dc.titleTrends and Challenges of Multimodal Solutions for Text and Image Context Extractionen_US
dc.typeKonferencijos publikacija / Conference paperen_US
dcterms.accrualMethodRankinis pateikimas / Manual submissionen_US
dcterms.issued2025-06-02
dcterms.references50en_US
dc.description.versionTaip / Yesen_US
dc.contributor.institutionVilniaus Gedimino technikos universitetasen_US
dc.contributor.institutionVilnius Gediminas Technical Universityen_US
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciencesen_US
dc.contributor.departmentInformacinių technologijų katedra / Department of Information Technologiesen_US
dcterms.sourcetitle2025 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), April 24, 2025, Vilnius, Lithuaniaen_US
dc.identifier.eisbn9798331598730en_US
dc.identifier.eissn2690-8506en_US
dc.publisher.nameIEEEen_US
dc.publisher.countryUnited States of Americaen_US
dc.publisher.cityNew Yorken_US
dc.identifier.doihttps://doi.org/10.1109/eStream66938.2025.11016903en_US


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record