АНАЛІТИЧНИЙ ОГЛЯД МЕТОДІВ І ТЕХНОЛОГІЙ ОПРАЦЮВАННЯ ВЕЛИКИХ ОБСЯГІВ ДАНИХ У ДЕЦЕНТРАЛІЗОВАНИХ СИСТЕМАХ

ВІТАЛІЙ ГУСАК

doi:10.31891/2307-5732-2026-365-8

Authors

VITALII HUSAK Lviv Polytechnic National University Author https://orcid.org/0009-0002-8415-2767

DOI:

https://doi.org/10.31891/2307-5732-2026-365-8

Keywords:

Apache Kafka, Hadoop, MapReduce, Apache Spark, Apache Flink, Big Data, decentralized systems, distributed systems, data-intensive applications, scalability, fault tolerance, consistency, stream processing, batch processing, data pipelines

Abstract

The article presents an analytical review of methods and technologies for processing large amounts of data in decentralized systems. The key engineering trade-offs of data-intensive applications are considered, in particular, scalability, consistency, reliability, performance, and the ability to maintain operability under conditions of increasing volumes and data rate. The logic of building end-to-end data pipelines is explained, including data ingestion, storage, transformation, and delivery to consumers, taking operational constraints into account.

Special attention is paid to the practical evaluation of data processing systems through the characteristics of the expected workload and the analysis of the system behavior during scaling: how performance changes with fixed resources and how many resources need to be added to maintain target indicators. The difference between the evaluation criteria of batch systems, in particular bandwidth and task execution time, and online services, for which response support is key, is noted.

The structural components of modern decentralized data architectures are systematically described, including distributed storage systems, caching layers, indexing mechanisms, messaging brokers, stream processing engines, and batch analytics frameworks. Particular attention is devoted to technologies such as Apache Kafka, Apache Hadoop, Apache Spark, and Apache Flink. Their architectural models, processing semantics (at-most-once, at-least-once, exactly-once), and suitability for real-time versus batch workloads are comparatively analyzed. The role of these platforms in decentralized big data ecosystems is evaluated with respect to elasticity, state management, checkpointing, and fault tolerance.

The article concludes that effective processing of large-scale data in decentralized systems requires an integrated approach that combines architectural modularity, workload-aware optimization, and continuous performance evaluation. The synthesis of streaming and batch paradigms, along with adaptive resource management strategies, forms the technological foundation for resilient and scalable big data infrastructures.

ANALYTICAL REVIEW OF METHODS AND TECHNOLOGIES FOR PROCESSING LARGE VOLUMES OF DATA IN DECENTRALIZED SYSTEMS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Language

Make a Submission

Index

Flag