Abstract:
This study aims to develop scalable methods to detect suspicious wallets using historical transaction data in cryptocurrency networks such as Ethereum and Bitcoin. Different transaction networks are generated for each wallet data set using the illicit wallets dispersed around the internet. Egonet-dependent and independent features are used with a range of machine learning techniques, including logistic regression (LR), random forest (RF), and XGBoost (XGB), to predict illicit wallets. Firstly, we analyze performance of models to detect suspicious wallets in the two datasets that include suspicious bitcoin mixer services wallets such as Bitcoinfog and Helix. The area under the ROC curve value (AUC) is over 99% for XGB models. We observe that models perform better on Helix wallets than BitcoinFog wallets in terms of precision, recall, f1 score, and AUC. Secondly, we notice that egonet dependent features do not significantly improve the models’ performances. Hence, best- performing models have only egonet independent features. Thirdly, on Bitcoin datasets that do not use any mixer services, we obtain over 99% AUC. Although the performance of the models is similar in these three datasets, dominant features in terms of feature importance measure are different between the datasets including wallets using mixer services (Helix, Bitcoinfog) and the other (Bitcoin). Lastly, utilizing the same feature set as we do on Bitcoin, Bitcoinfog and Helix datasets, we train the same machine learning models on the Ethereum dataset and obtain 96% AUC. We repeated the tests with varying degrees of class imbalance to simulate real-life situations. We observe a decline in AUC up to 0.10 together with the increasing severity of the class imbalance.