An Approach for Building IT Support Dataset for Machine Learning Models
Abstract
This study investigates the challenges of preparing datasets for machine learning models based on the data of a centralized system for managing IT incidents within an organization. Key challenges include data quality issues, class imbalance, the need for anonymization, and redundancy in the information. Various data preparation techniques are analyzed, such as handling missing values, encoding categorical and textual data, balancing datasets, anonymizing sensitive information, and performing feature selection. The paper highlights its structural complexities and processing difficulties by examining the state enterprise's Service Desk incident data. Furthermore, the impact of data engineering and cleaning techniques on the accuracy and reliability of machine learning models is assessed. Finally, specific techniques to improve data preparation and to optimize model performance are analyzed.
