An Approach for Building IT Support Dataset for Machine Learning Models

Jevsejev, Roman; Mažeika, Dalius; Bereiša, Mindaugas

Date

2025

Author

Jevsejev, Roman

Mažeika, Dalius

Bereiša, Mindaugas

Metadata

Show full item record

Abstract

This study investigates the challenges of preparing datasets for machine learning models based on the data of a centralized system for managing IT incidents within an organization. Key challenges include data quality issues, class imbalance, the need for anonymization, and redundancy in the information. Various data preparation techniques are analyzed, such as handling missing values, encoding categorical and textual data, balancing datasets, anonymizing sensitive information, and performing feature selection. The paper highlights its structural complexities and processing difficulties by examining the state enterprise's Service Desk incident data. Furthermore, the impact of data engineering and cleaning techniques on the accuracy and reliability of machine learning models is assessed. Finally, specific techniques to improve data preparation and to optimize model performance are analyzed.

Issue date (year)

2025

Author

Jevsejev, Roman

URI

https://etalpykla.vilniustech.lt/handle/123456789/159726

Collections

2025 International Conference "Electrical, Electronic and Information Sciences“ (eStream) [51]