Multi-purpose password dataset generation and its application in decision making for password cracking through machine learning
Abstract
This article proposes a method for multi-purpose password dataset generation suitable for use in further machine learning and other research related, directly or indirectly, to passwords. Currently, password datasets are not suitable for machine learning or decision-driven password cracking. Most password datasets are just any old password dictionaries that contain only leaked and common passwords and no other information. Other password datasets are small and include only weak passwords that have previously been leaked. The literature is rich in terms of methods used for password cracking based on password datasets. Those methods are mainly focused on generating more password candidates like the ones included in the training dataset. The proposed method exploits statistical analysis of leaked passwords and randomness to ensure diversity in the dataset. An experiment with the generated dataset has shown significant improvement in time when performing dictionary attack but not when performing brute-force attack.