The role of inaccurate assumptions for churn prediction in telecommunications
Santrauka
This topic is dedicated to the accuracy problem in machine learning due to some assumptions. More specifically, a special case of churn prediction in telecommunications is investigated. The source of the mentioned problem is the shift in definition of a churner. A churner is defined as the user who has stopped using some specific services, in the considered case it is telecommunication services from specific operator. The most common exact definition of the churner in telecommunications is the client that has not done any revenue generating actions for 3 months. However, it is common among other authors [1] to change the original definition by reducing the observation period for churned identification – this is motivated by the fact that for the most of churners the inactivity for one month is followed by 3 months inactivity. In many datasets the definition of the churner is not provided at all, thus it makes questionable the relevancy of the actual problem being solved. In this research we investigate the consequences of the changes of churn definition, a set of standard machine learning methods is applied to the dataset labelled according to different churn definitions. We show that inaccuracies of the achieved prediction are at least of the same order as the differences of performance of different machine learning techniques in other authors’ researches [1], thus questioning the scientific value of such comparison without addressing the inaccuracy due to shifts in definitions.