МЕТОД КЛАСИФІКАЦІЇ КОНФІДЕНЦІЙНОЇ ІНФОРМАЦІЇ ІЗ ЗАСТОСУВАННЯМ МАШИННОГО НАВЧАННЯ

БОГДАН ПАЛІЙЧУК; ЕДУАРД МАНЗЮК; ТЕТЯНА СКРИПНИК; ОЛЕКСАНД ПАСІЧНИК

doi:10.31891/2307-5732-2025-357-13

Authors

BOHDAN PALIYCHUK Khmelnytskyi National University Author
EDUARD MANZIUK Khmelnytskyi National University Author https://orcid.org/0000-0002-7310-2126
TETIANA SKRYPNYK Khmelnytskyi National University Author https://orcid.org/0000-0002-8531-5348
OLEKSANDR PASICHNYK Khmelnytskyi National University Author https://orcid.org/0000-0002-8760-4688

DOI:

https://doi.org/10.31891/2307-5732-2025-357-13

Keywords:

data classification, confidential information, machine learning, Naive Bayes classifier, Laplace smoothing, SVM, information security

Abstract

This article focuses on the development and improvement of a confidential information classification method based on text data analysis using machine learning. The primary goal of the research was to create an effective tool for automatic detection of sensitive information in structured and unstructured data to enhance their protection. The method employs machine learning algorithms, specifically the Naive Bayes classifier with Laplace smoothing, and Support Vector Machine (SVM) for comparative evaluation. The dataset used for training and testing included real corporate data, public datasets, and synthetic examples, ensuring high diversity and representativeness.
The developed method demonstrates high accuracy (92%), recall (90%), and F1-score (91%), outperforming traditional approaches such as Naive Bayes (84% accuracy) and SVM (88% accuracy). Special attention was given to handling rare or unique data classes by applying Laplace smoothing, which significantly improved the model’s robustness and stability. This approach enables more reliable identification of confidential data, which is critical for ensuring information security in organizations.
The article also details the data preparation process, including cleaning, tokenization, normalization, and splitting into training, validation, and test sets. A comparative analysis of different algorithms’ effectiveness was conducted, with results presented using accuracy, recall, and F1-score metrics. Recommendations for further improvements include expanding functionality, enhancing scalability, and adapting to specific industry requirements.
The results confirm the promise of machine learning for automated protection of confidential information and can be useful for security system developers, information security researchers, and practitioners involved in data protection. The proposed method has potential for integration into corporate information management systems and contributes to improving the overall level of cybersecurity.

METHOD FOR CLASSIFICATION OF CONFIDENTIAL INFORMATION USING MACHINE LEARNING

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Language

Make a Submission

Index

For Avtors

Flag