METHOD FOR INTERPRETING FAKE NEWS DETECTION RESULTS USING LARGE LANGUAGE MODEL
DOI:
https://doi.org/10.31891/2307-5732-2025-359-91Keywords:
fake news, LLM, XAI, Integrated Gradients, SHAP, UMAP, t-SNE, human-in-the-loopAbstract
The proliferation of sophisticated disinformation campaigns necessitates not only accurate detection but also a clear, justifiable understanding of how and why a model reaches its conclusions. To this end, we propose a method founded on a transparent and reproducible approach that uniquely integrates local explainable artificial intelligence (XAI) with global feature analysis, all operating within an interactive human-in-the-loop (HITL) cycle. At the local level, our method employs powerful attribution techniques—namely, Integrated Gradients and SHAP—to provide fine-grained, instance-level explanations. These tools deconstruct a model's prediction for any given news article, highlighting the specific words, phrases, and semantic patterns that most heavily influenced its classification as either authentic or fake. Complementing this granular analysis, we utilize global feature projection methods, such as t-SNE and UMAP, to visualize the entire data space in lower dimensions. This offers a macro-level perspective, revealing the distinct clusters formed by fake and real news, identifying outliers, and illuminating the model's overall decision boundaries. The synergy between these local and global views, governed by the HITL cycle, empowers analysts to iteratively refine the model, correct misclassifications, and build robust, trustworthy systems. To validate the performance of our method, we implemented and rigorously tested a DistilBERT model across several diverse data corpora. The model's performance was quantitatively assessed using a suite of standard metrics, including Accuracy (ACC), Precision/Recall/F1-score, and AUROC, while its classification behavior was qualitatively analyzed through confusion matrices and ROC curves. The results obtained demonstrate a high degree of consistency with established benchmarks and findings from open publications in the 2020–2025 period, thereby confirming the reliability, validity, and reproducibility of our proposed interpretive approach.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 СТЕФАНІЯ ВОВК, ПАВЛО РАДЮК, ТЕТЯНА СКРИПНИК (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.