A combined approach for multi-label text data classification

Štrimaitis, Rokas; Stefanovič, Pavel; Ramanauskaitė, Simona; Slotkienė, Asta

dc.contributor.author	Štrimaitis, Rokas
dc.contributor.author	Stefanovič, Pavel
dc.contributor.author	Ramanauskaitė, Simona
dc.contributor.author	Slotkienė, Asta
dc.date.accessioned	2023-09-18T16:17:36Z
dc.date.available	2023-09-18T16:17:36Z
dc.date.issued	2022
dc.identifier.issn	1687-5265
dc.identifier.uri	https://etalpykla.vilniustech.lt/handle/123456789/112866
dc.description.abstract	Automated data analysis solutions are very dependent on data and its quality. The possibility of assigning more than one class to the same data item is one of the specificities that need to be taken into account. There are no solutions, dedicated to Lithuanian text data classification that helps to assign more than one class to data item. In this paper, a new combined approach has been proposed for multilabel text data classification for text analysis. The main aim of the proposed approach is to improve the accuracy of traditional classification algorithms by incorporating the results obtained using similarity measures. The experimental investigation has been performed using the financial news multilabel text data in the Lithuanian language. Data have been collected from four public websites and classified by experts into ten classes manually, where each of the data items has no more than two classes. The results of five commonly used algorithms have been compared for dataset classification: the support vector machine, multinomial naive Bayes, k-nearest neighbours, decision trees, linear and discriminant analysis. In addition, two similarity measures have been compared: the cosine distance and the dice coefficient. Research has shown that the best results have been obtained using the cosine similarity distance and the multinomial naive Bayes classifier. The proposed approach combines the results of these two methods. Research on different cases of the proposed approach indicated the peculiarities of its application. At the same time, the combined approach allowed us to obtain a statistically significant increase in global accuracy.	eng
dc.format	PDF
dc.format.extent	p. 1-13
dc.format.medium	tekstas / txt
dc.language.iso	eng
dc.relation.isreferencedby	Science Citation Index Expanded (Web of Science)
dc.relation.isreferencedby	Scopus
dc.relation.isreferencedby	INSPEC
dc.relation.isreferencedby	MEDLINE
dc.relation.isreferencedby	ProQuest Central
dc.relation.isreferencedby	PubMed
dc.source.uri	https://www.hindawi.com/journals/cin/2022/3369703/
dc.title	A combined approach for multi-label text data classification
dc.type	Straipsnis Web of Science DB / Article in Web of Science DB
dcterms.accessRights	This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dcterms.license	Creative Commons – Attribution – 4.0 International
dcterms.references	28
dc.type.pubtype	S1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institution	Vilniaus Gedimino technikos universitetas
dc.contributor.faculty	Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfield	T 007 - Informatikos inžinerija / Informatics engineering
dc.subject.vgtuprioritizedfields	IK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializations	L106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.en	multi-label text data
dc.subject.en	similarity distance
dc.subject.en	classification
dc.subject.en	Lithuanian language
dc.subject.en	financial text data
dcterms.sourcetitle	Computational intelligence and neuroscience
dc.description.volume	vol. 2022
dc.publisher.name	Hindawi
dc.publisher.city	London
dc.identifier.doi	000820934600009
dc.identifier.doi	10.1155/2022/3369703
dc.identifier.elaba	134438204

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Straipsniai Web of Science ir/ar Scopus referuojamuose leidiniuose / Articles in Web of Science and/or Scopus indexed sources [7946]

Show simple item record