Detecting main topics using dictionary-based topic analysis
Abstract
This paper describes a dictionary-based software for topic analysis written by the author. The dictionary was created manually. Many studies showed the advantages of using dictionaries to analyze texts. The software described here works in English and Italian languages, and it does not make use of probabilistic methods. In natural language processing, the use of a lexicon to reveal topics in a text is often avoided. Topics depend very much on the context. Assigning unique words to each topic does not help to check the topics in different contexts. However, the software, with a dictionary of about 5,500 topic words described in the paper, in many cases, allows the same word to fall into different topics. This approach allows one to find the main topics in a text, which corresponds to the most frequent topic words detected by the software. Advantages and disadvantages are discussed in the paper, along with examples. The software was extensively tested on large texts, such as Internet news corpora and classics of English and American literature, showing very high reliability in detecting the main topics. Analysis of topics in literaryworks demonstrates almost the same conclusions as were reached by critics.