COMPARATIVE STUDY OF MACHINE LEARNING METHODS FOR STREAMING DATA PROCESSING
DOI:
https://doi.org/10.31891/2307-5732-2026-361-54Keywords:
machine learning, LightGBM, XGBoost, Kafka, forecastingAbstract
The process of analysing large-scale streaming cryptocurrency data by machine learning algorithms is the object of this research. Handling terabyte-scale, high-velocity data streams presents a critical challenge due to the computational and accuracy limitations of classical machine learning methods, which struggle with the volume and complexity of millions of temporal records. The principal result is the development of a distributed processing pipeline featuring a Feature Store architecture. This solution enabled LightGBM and XGBoost algorithms to achieve superior predictive performance (R² was 0.9998 and 0.9997, respectively) while processing 1.33 million streaming records across 100 cryptocurrency pairs. The research methodology included a comprehensive feature engineering phase, extracting a set of temporal, statistical, and technical indicators, such as rolling means, volatility measures, and lagged price values, which are crucial for capturing dependencies in big data. This performance advantage is attributed to the architectural capabilities of gradient boosting algorithms. The proposed pipeline successfully shifts the process from conventional linear approaches to advanced tree-based ensemble methods with optimized memory management, demonstrating that gradient boosting algorithms possess the necessary computational efficiency and pattern recognition capabilities that Decision Tree, Random Forest, and Regression methods lack. In practice, the findings provide clear guidelines for big data practitioners. The Feature Store architecture with temporal stratified sampling is a scalable framework achieving 5.7x data reduction and near 82% memory savings. For production systems handling high-velocity streaming data, gradient boosting algorithms (particularly LightGBM with 0.63 s training time) are the superior strategy over traditional methods for achieving both accuracy and computational efficiency.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 ІВАН ХАМАР, ІГОР ОЛЕНИЧ (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.