In this presentation, we will demonstrate how to build a machine learning model that uses a merged dataset combining cyber related contextual information with Bitcoin (BTC) transaction data. The model can be used by both private and public sectors security professionals, working in the cryptocurrency field, to deny business for certain BTC addresses or, build legal cases to return illegally stolen coins.
To build the dataset, we collected a list of BTC addresses involved in illegal activities. Using these addresses as a starting point, we navigated along the chain, and reconstructructed a cluster of connected ‚Äúdirty‚Äù addresses. We used rules such as First-In-First-Out (FIFO) to label them. These labeling techniques can be used to tag certain BTC addresses that fall within this path as ‚Äúdirty‚Äù addresses because they handled money acquired through illegal activities. We can then take this a step further and analyze the characteristic behavioral elements of these addresses. This behavioral analysis will allow us to determine the features representing this malicious behavior and use them within a machine learning model classifying new BTC addresses.
Our model-building approach is based on a three part framework: The first part is to collect a set of BTC addresses and classify them as ‚Äúclean‚Äù or ‚Äúdirty‚Äù to use them as our ground truth. The second part is to test the classification models using this dataset and propose decision metrics to optimally pick a model. In this part, we will also discuss ideas about how to compute expensive, but important features obtained from transaction data. In the third part, we will show how to use the obtained optimal model to predict if an address is ‚Äúdirty‚Äù. Finally, we will discuss our challenges when solving this problem and propose solutions to overcome them.