Text Classification using Different Feature Extraction Approaches

Dzisevič, Robert; Šešok, Dmitrij

Data

2019

Autorius

Dzisevič, Robert

Šešok, Dmitrij

Metaduomenys

Rodyti detalų aprašą

Santrauka

In this paper, we examine the results of applying three different text feature extraction approaches while classifying short sentences and phrases into categories with a neural network in order to find out which method is best at capturing text features and allows the classifier to achieve highest accuracy. The examined feature extraction methods include a plain Term Frequency Inverse Document Frequency (TF-IDF) approach and its two modifications by applying different dimensionality reduction techniques: Latent Semantic Analysis (LSA) and Linear Discriminant Analysis (LDA). The results show that the TF-IDF feature extraction approach outperforms other methods allowing the classifier to achieve highest accuracy when working with larger datasets. Furthermore, the results show that the TF-IDF in combination with LSA approach allows the classifier to achieve similar accuracy while working with smaller datasets.

Paskelbimo data (metai)

2019

Autorius

Dzisevič, Robert

URI

https://etalpykla.vilniustech.lt/handle/123456789/159518

Kolekcijos

2019 International Conference "Electrical, Electronic and Information Sciences“ (eStream) [27]