Towards a robust method of dataset generation of malicious activity on a windows-based operating system for anomaly-based HIDS training
Abstract
Classical cyber-attack detection methods, based on signatures and rules demonstrate stagnation and inability to fight the zero-day, advanced-persistent-threat and similar attacks, while anomaly-based detection methods, although were exploited for a number of years, are still characterized by huge numbers of false-positives (valid user or application behavior, that has been classified as intrusion) and ability to work in relatively stable conditions. The progress chieved in recent years in the area of deep learning artificial intelligence techniques provide a potential for renewing the research on the topic and for achieving promising results. Anomaly-based intrusion detection systems (IDS) utilize the ability to learn from a training set of legal and malicious actions. In order to train anomaly-based IDS systems enormous amount of data is required. Majority of available datasets used for IDS training are related to the network-level based intrusion detection, while datasets for host-based intrusion detection system (HIDS), which is becoming extremely important, training are not available or incomplete and lack important features. In this article we propose a method for automated system-level anomaly dataset generation that is to be used in further training of artificial intelligence-based HIDS training. Details for method implementation are also presented and test results discussed.
