ANALYSIS OF THE APPLICATION OF ARTIFICIAL INTELLIGENCE FOR THE IDENTIFICATION OF MALICIOUS APK FILES IN THE ANDROID ENVIRONMENT
DOI:
https://doi.org/10.31891/2307-5732-2025-351-15Keywords:
artificial intelligence, malicious APK files, Android, XGBClassifier, RandomForestClassifier, mobile application securityAbstract
This article presents a comprehensive examination of how artificial intelligence (AI) can be employed to detect malicious APK files within the Android environment. The research investigates machine learning algorithms, specifically XGBClassifier and RandomForestClassifier, which have demonstrated robust capabilities in analyzing diverse datasets and identifying harmful content. By leveraging these algorithms, the study aims to streamline the process of classifying APK files according to their threat level and thereby improve the overall security of mobile ecosystems. A crucial aspect of the proposed method involves the transformation of textual data into meaningful embeddings using the OpenAI text-embedding-3-small model, enabling the algorithms to capture semantic information and enhance the accuracy of threat detection.
The proposed workflow addresses several key challenges in the field of mobile security. Firstly, the feature extraction process is designed to incorporate code analysis and metadata inspection to ensure that all relevant signals are captured prior to training. Secondly, the use of ensemble methods underscores the importance of combining different machine learning techniques for improved prediction performance. By comparing the results of XGBClassifier and RandomForestClassifier, the research provides insights into the relative strengths of gradient boosting versus bagging approaches. Moreover, experimental evaluations are conducted on both benign and malicious APK files, drawn from open-source malware database, to validate the effectiveness and reliability of the classification pipeline. The outcomes offer clear indications that intelligent, AI-driven methodologies can significantly reduce false positives and false negatives, paving the way for more efficient, large-scale security solutions in Android environments. Beyond the empirical evaluation, the article delves into practical considerations for deploying these models in real-world applications, including scalability, model interpretability, and resource constraints. Finally, the research highlights future directions, such as exploring novel embedding techniques, refining hyperparameters for enhanced performance, and extending the training pipeline to encompass evolving malware patterns. These findings underscore the potential of AI-based solutions to play a pivotal role in safeguarding millions of Android users against ever-emerging cyber threats.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 МИКОЛА ГАВРИЛОВ (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.