CREDIT CARD FRAUD DETECTION METHODS OF MACHINE LEARNING
DOI:
https://doi.org/10.31891/2307-5732-2024-333-2-30Keywords:
classifiers, performance metrics, dimensionality reduction, oversampling, random undersampling, logistic regression, fully connected layer neural networkAbstract
In the work, a study of large volumes of transactions was conducted. The problem of fraud detection is unique because it is necessary to take into account: data imbalance (ie, fraudulent transactions are usually less than 1% compared to normal ones); fraud scenarios change over time and need to be detected quickly; transactions usually contain numerous categorical characteristics; as a result of the confidentiality of transactions, there are no publicly available datasets. All this creates problems with the development of classification methods and with the selection of performance evaluation metrics. Threshold indicators for evaluating the effectiveness of classifiers were studied and the expediency of using thresholdless metrics was substantiated. There is currently no consensus on which set of performance indicators should be used. The primary analysis of research data was carried out and two techniques for eliminating class imbalance were applied: random undersampling, SMOTE technique. Removal of outliers was shown to improve the accuracy of the classification methods by more than 3%. A number of classifiers were built to determine fraudulent operations, statistical processing of the obtained results was carried out, which allowed to assess the adequacy of the built classifiers, to determine their optimal parameters, at which the classifiers work with maximum efficiency. It should be noted that all classifiers were tested on real data. Determined: The logistic regression classifier performs best on both the training and cross-validation sets. The Precision-Recall indicator was used to assess the effectiveness of the logistic regression model. For undersampling and oversampling, fully connected neural networks with one hidden layer were constructed and their accuracy was compared. It should be noted that the neural network on the oversampled data set predicts fewer correct fraud transactions than the model using the undersampled data set.