• Lietuvių
    • English
  • English 
    • Lietuvių
    • English
  • Login
View Item 
  •   DSpace Home
  • Universiteto produkcija / University's production
  • Universiteto leidyba / University's Publishing
  • Konferencijų medžiaga / Conference Materials
  • Tarptautinės konferencijos / International Conferences
  • International Conference "Electrical, Electronic and Information Sciences“ (eStream)
  • 2025 International Conference "Electrical, Electronic and Information Sciences“ (eStream)
  • View Item
  •   DSpace Home
  • Universiteto produkcija / University's production
  • Universiteto leidyba / University's Publishing
  • Konferencijų medžiaga / Conference Materials
  • Tarptautinės konferencijos / International Conferences
  • International Conference "Electrical, Electronic and Information Sciences“ (eStream)
  • 2025 International Conference "Electrical, Electronic and Information Sciences“ (eStream)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Trends and Challenges of Multimodal Solutions for Text and Image Context Extraction

Thumbnail
Date
2025
Author
Kvietkauskas, Tautvydas
Stefanovič, Pavel
Metadata
Show full item record
Abstract
The number of published images, texts, and other information rapidly increases in today’s digital space. The ability to simultaneously process textual and visual information helps to interpret content more accurately. It enables the application of artificial intelligence in complex situations, such as contextual analysis and real-time monitoring of social networks. Today’s Large Language Models (LLMs) are based on text data, Object Recognition (OR) on visual data, and Vision-language (VL) models use text and visual data. The combinations of these different models can be used to create multimodal solutions to solve various context-extraction tasks. In such a way, this requires images and texts, which are input to the multimodal models. This paper systematically reviews the latest research on existing LLM, OR and VL models. The scientific articles were analysed on the Web of Science and Google Scholar databases. All the papers are free to access and date from 2019 to 2024. The main objective was to summarise the results of scientific research, tasks, and methods used in text, image, and image-text data analysis. The types of datasets and the language of the texts used in the research were also reviewed. Additionally, the results are useful for highlighting trends and challenges in the context extraction field that can be useful to other researchers.
Issue date (year)
2025
Author
Kvietkauskas, Tautvydas
URI
https://etalpykla.vilniustech.lt/handle/123456789/159684
Collections
  • 2025 International Conference "Electrical, Electronic and Information Sciences“ (eStream) [25]

 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specializationThis CollectionBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specialization

My Account

LoginRegister