Machine Learning and Statistical Techniques for Outlier Detection in Smart Home Energy Consumption
Date
2024Author
Krishna, N. Sri
Pavan Kumar, Y. V.
Prakash, K. Purna
Pradeep Reddy, G.
Metadata
Show full item recordAbstract
Due to the continuous increase of smart home culture worldwide, large volumes of energy consumption data gained the attention of data scientists. Smart meters capture the energy consumption readings at a predefined rate and store them as a database. The quality of these databases is highly desired to have accurate analysis and decision-making. But, these readings often have anomalies namely missingness, redundancy, and outliers due to the issues present in meter/data communication networks. Among these, outlier readings indicate an abnormality of the load behavior (e.g.: nonlinearity, unpredicted load switching, system faults, etc.). Hence, it is essential to detect and visualize such anomalies for the necessary treatment. With this motivation, this paper implements various key machine learning and statistical techniques namely autoregressive integrated moving average (ARIMA), autoencoder, density-based spatial clustering of applications with noise (DBSCAN), isolation forest, k-means, hierarchical density-based spatial clustering of applications with noise (HDBSCAN), one-class support vector machine (SVM), local outlier factor (LOF), long short-term memory (LSTM), winsorization, interquartile range (IQR), and Z-score. The results revealed that DBSCAN consistently demonstrated the most accurate performance in detecting outliers in energy data, while, Z-score, IQR, and winsorization provided reasonable outcomes but were limited in handling complex and non-linear data patterns. Autoencoder, Isolation forest, and One-class SVM showed moderate success, but their performance depended on the specific dataset characteristics. Kmeans exhibited mixed results. ARIMA, LOF, LSTM, and HDBSCAN had limited success in outlier detection in the timeseries data. Thus, this analysis finally recommends DBSCAN as the best technique as it consistently outperformed other machine learning and statistical techniques in accurately detecting outliers in smart home energy consumption data.
