METHOD FOR CLASSIFICATION OF CONFIDENTIAL INFORMATION USING MACHINE LEARNING
DOI:
https://doi.org/10.31891/2307-5732-2025-357-13Keywords:
data classification, confidential information, machine learning, Naive Bayes classifier, Laplace smoothing, SVM, information securityAbstract
This article focuses on the development and improvement of a confidential information classification method based on text data analysis using machine learning. The primary goal of the research was to create an effective tool for automatic detection of sensitive information in structured and unstructured data to enhance their protection. The method employs machine learning algorithms, specifically the Naive Bayes classifier with Laplace smoothing, and Support Vector Machine (SVM) for comparative evaluation. The dataset used for training and testing included real corporate data, public datasets, and synthetic examples, ensuring high diversity and representativeness.
The developed method demonstrates high accuracy (92%), recall (90%), and F1-score (91%), outperforming traditional approaches such as Naive Bayes (84% accuracy) and SVM (88% accuracy). Special attention was given to handling rare or unique data classes by applying Laplace smoothing, which significantly improved the model’s robustness and stability. This approach enables more reliable identification of confidential data, which is critical for ensuring information security in organizations.
The article also details the data preparation process, including cleaning, tokenization, normalization, and splitting into training, validation, and test sets. A comparative analysis of different algorithms’ effectiveness was conducted, with results presented using accuracy, recall, and F1-score metrics. Recommendations for further improvements include expanding functionality, enhancing scalability, and adapting to specific industry requirements.
The results confirm the promise of machine learning for automated protection of confidential information and can be useful for security system developers, information security researchers, and practitioners involved in data protection. The proposed method has potential for integration into corporate information management systems and contributes to improving the overall level of cybersecurity.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 БОГДАН ПАЛІЙЧУК, ЕДУАРД МАНЗЮК, ТЕТЯНА СКРИПНИК, ОЛЕКСАНД ПАСІЧНИК (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.