INFORMATION TECHNOLOGY OF DISINFORMATION DATASETDEVELOPMENT USING INTELLIGENT SEARCH OF DEEPFAKES AND CLICKBAITS

Authors

DOI:

https://doi.org/10.31891/2307-5732-2024-343-6-24

Keywords:

statistical characteristics of indicators, disinformation, dataset, intelligent search for disinformation, changing the dynamics of the behavior of chat participants, detection of fakes and propaganda, accuracy, precision and recall and F1-Score

Abstract

The work considers the methodology of developing and filling the dataset of fakes for further training of the model and its testing in order to identify disinformation and propaganda, determine the signs of primary sources and routes of their distribution, as well as find criteria and parameters for changing the dynamics of the behavior of chat participants using intelligent search tools. The features of disinformation criteria based on the Rabat action plan in the context of the research topic are described. The existing methods of intelligent search for disinformation are considered, the features of fact-checking sites are analyzed, and examples of filling the actual dataset of fakes in the period after the full-scale invasion of Ukraine are given. The existing strategies of the anti-disinformation plan are analyzed and the features of the types of fakes, namely deepfakes and clickbaits, are described.

Experiments were conducted on the developed dataset using machine learning models, in particular, using TF-IDF-based model and BERT-based model. The results of training and testing the machine learning model using such metrics as accuracy, precision and recall and F1-Score are given. The model achieves an overall accuracy of 0.846. This means that 84.6% of all model predictions were correct. For class 0 (true), the model has high precision (0.78) and perfect recall (1.00), indicating that the model is good at detecting true texts. For class 1 (misinformation), the model has a high precision (1.00) but a lower recall (0.67), indicating that the model may miss some cases of misinformation, although it accurately identifies those cases it classifies as misinformation.  F1-Score is a harmonic mean between precision and recall. For class 0, this indicator is 0.88, and for class 1 - 0.80, which indicates a balanced performance of the model between these two metrics. Taking into account the obtained results, it can be concluded that the program generally works and fulfills its main task of detecting disinformation.

Published

2024-11-28

How to Cite

LOZYNSKA, O., MARKIV, O., VYSOTSKA, V., ROMANCHUK, R., & NAZARKEVYCH, M. (2024). INFORMATION TECHNOLOGY OF DISINFORMATION DATASETDEVELOPMENT USING INTELLIGENT SEARCH OF DEEPFAKES AND CLICKBAITS. Herald of Khmelnytskyi National University. Technical Sciences, 343(6(1), 158-167. https://doi.org/10.31891/2307-5732-2024-343-6-24