Show simple item record

dc.contributor.authorStefanovič, Pavel
dc.contributor.authorKurasova, Olga
dc.date.accessioned2023-09-18T16:12:40Z
dc.date.available2023-09-18T16:12:40Z
dc.date.issued2022
dc.identifier.issn0868-4952
dc.identifier.other(crossref_id)133672755
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/112373
dc.description.abstractIn this paper, a new approach has been proposed for multi-label text data class verification and adjustment. The approach helps to make semi-automated revisions of class assignments to improve the quality of the data. The data quality significantly influences the accuracy of the created models, for example, in classification tasks. It can also be useful for other data analysis tasks. The proposed approach is based on the combination of the usage of the text similarity measure and two methods: latent semantic analysis and self-organizing map. First, the text data must be pre-processed by selecting various filters to clean the data from unnecessary and irrelevant information. Latent semantic analysis has been selected to reduce the vectors dimensionality of the obtained vectors that correspond to each text from the analysed data. The cosine similarity distance has been used to determine which of the multi-label text data class should be changed or adjusted. The self-organizing map has been selected as the key method to detect similarity between text data and make decisions for a new class assignment. The experimental investigation has been performed using the newly collected multi-label text data. Financial news data in the Lithuanian language have been collected from four public websites and classified by experts into ten classes manually. Various parameters of the methods have been analysed, and the influence on the final results has been estimated. The final results are validated by experts. The research proved that the proposed approach could be helpful to verify and adjust multi-label text data classes. 82% of the correct assignments are obtained when the data dimensionality is reduced to 40 using the latent semantic analysis, and the self-organizing map size is reduced from 40 to 5 by step 5.eng
dc.formatPDF
dc.format.extentp. 109-130
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.relation.isreferencedbyScience Citation Index Expanded (Web of Science)
dc.relation.isreferencedbyScopus
dc.relation.isreferencedbyVINITI
dc.relation.isreferencedbyZentralblatt MATH (zbMATH)
dc.titleApproach for multi-label text data class verification and adjustment based on self-organizing map and latent semantic analysis
dc.typeStraipsnis Web of Science DB / Article in Web of Science DB
dcterms.licenseCreative Commons – Attribution – 4.0 International
dcterms.references34
dc.type.pubtypeS1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.institutionVilniaus universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.researchfieldN 009 - Informatika / Computer science
dc.subject.vgtuprioritizedfieldsIK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializationsL106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.enmulti-label text data
dc.subject.enclustering
dc.subject.enself-organizing map
dc.subject.enlatent semantic analysis
dc.subject.enLithuanian language
dcterms.sourcetitleInformatica
dc.description.issueiss. 1
dc.description.volumevol. 33
dc.publisher.nameVilnius University Institute of Data Science and Digital Technologies
dc.publisher.cityVilnius
dc.identifier.doi133672755
dc.identifier.doi000766621900005
dc.identifier.doi10.15388/22-INFOR473
dc.identifier.elaba116179466


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record