ANALYTICAL REVIEW OF METHODS AND TECHNOLOGIES FOR PROCESSING LARGE VOLUMES OF DATA IN DECENTRALIZED SYSTEMS
DOI:
https://doi.org/10.31891/2307-5732-2026-365-8Keywords:
Apache Kafka, Hadoop, MapReduce, Apache Spark, Apache Flink, Big Data, decentralized systems, distributed systems, data-intensive applications, scalability, fault tolerance, consistency, stream processing, batch processing, data pipelinesAbstract
The article presents an analytical review of methods and technologies for processing large amounts of data in decentralized systems. The key engineering trade-offs of data-intensive applications are considered, in particular, scalability, consistency, reliability, performance, and the ability to maintain operability under conditions of increasing volumes and data rate. The logic of building end-to-end data pipelines is explained, including data ingestion, storage, transformation, and delivery to consumers, taking operational constraints into account.
Special attention is paid to the practical evaluation of data processing systems through the characteristics of the expected workload and the analysis of the system behavior during scaling: how performance changes with fixed resources and how many resources need to be added to maintain target indicators. The difference between the evaluation criteria of batch systems, in particular bandwidth and task execution time, and online services, for which response support is key, is noted.
The structural components of modern decentralized data architectures are systematically described, including distributed storage systems, caching layers, indexing mechanisms, messaging brokers, stream processing engines, and batch analytics frameworks. Particular attention is devoted to technologies such as Apache Kafka, Apache Hadoop, Apache Spark, and Apache Flink. Their architectural models, processing semantics (at-most-once, at-least-once, exactly-once), and suitability for real-time versus batch workloads are comparatively analyzed. The role of these platforms in decentralized big data ecosystems is evaluated with respect to elasticity, state management, checkpointing, and fault tolerance.
The article concludes that effective processing of large-scale data in decentralized systems requires an integrated approach that combines architectural modularity, workload-aware optimization, and continuous performance evaluation. The synthesis of streaming and batch paradigms, along with adaptive resource management strategies, forms the technological foundation for resilient and scalable big data infrastructures.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 ВІТАЛІЙ ГУСАК (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.