A Roadmap on Developing a Taxonomy for Text Data Mining
Abstract
Over the past decade, unstructured text data have increased significantly. Text data are utilized in various scientific research, such as sentiment analysis, semantic analysis, context extraction, or named-entity recognition. Nowadays, widely used Large Language Models (LLMs) are also based on text data. Depending on the type of task, different algorithms can be used to analyze the text data, such as classification, clustering, or the latest transformer models. In this paper, a systematic literature review of text data mining has been performed. During the research, the analysis of scientific articles was performed based on two different scientific databases: Web of Science and Google Scholar. The main aim of the research was to summarize the results of scientific researches, tasks, and methods used in text data analysis. The types of datasets and the language of the texts used in the research were also analyzed. Furthermore, the results obtained from the systematic literature that was performed allowed us to build a taxonomy of text data mining that can be helpful to other researchers.
