IMPROVED BINARY DATA BLOCK CLASSIFICATION MODEL FOR FILE CARVING PROBLEMS

VIACHESLAV MOSKALENKO; MAKSYM BOIKO

doi:10.31891/2307-5732-2025-353-1

Authors

VIACHESLAV MOSKALENKO Sumy State University Author https://orcid.org/0000-0001-6275-9803
MAKSYM BOIKO Sumy State University The National Anti-corruption Bureau of Ukraine Author https://orcid.org/0000-0003-0950-8399

DOI:

https://doi.org/10.31891/2307-5732-2025-353-1

Keywords:

artificial intelligence, machine learning, neural network, classification, identification, data analysis, dataset, information technology

Abstract

In this paper, we address the problem of classifying blocks of binary data as an integral stage in the file-carving process under conditions of high fragmentation. Existing models and methods exhibit high error rates depending on a variety of factors, and in real-world applications the data to be analyzed often differ from those found in training datasets. The aim of this study is to improve the effectiveness of binary-data-block classification models and to overcome the challenges involved in detecting fragments of non-target file types.

To that end, we have enhanced existing file-fragment identification models by introducing an additional classifier head responsible for constructing class prototypes in a discrete feature space. During training, this auxiliary branch serves to regularize the feature space for both target and non-target file fragments. At the same time, class-specific boundaries (or containers) are established to enable the detection of data that fall outside the training distribution.

Experimental evaluation of the proposed models demonstrated accuracy gains of 1.9 % to 3.1 % compared with baseline approaches, depending on the application scenario. Overall accuracy for identifying fragments of target file types partitioned into 5, 11, and 25 classes ranged from 88 % to 98 %, 53 % to 100 %, and 72 % to 100 %, respectively. Training also yielded an increase in the Hamming distance between prototype vectors in the binary feature space within the regularizing classifier head. The macro-averaged F1 scores were 91.78 %, 59.97 %, and 82.94 % for the 5-, 11-, and 25-class scenarios, respectively. The lower performance in the 11-class case appears to result from significant overlap between classes in the feature space. Thus, the introduction of a regularizing classifier branch in the proposed models leads to higher classification accuracy for binary-data blocks, although the results depend on the chosen class-partitioning scheme.

IMPROVED BINARY DATA BLOCK CLASSIFICATION MODEL FOR FILE CARVING PROBLEMS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Language

Make a Submission

Index

Flag