HYBRID APPROACH TO PARALLEL COMPUTING FOR EFFICIENT BIG DATA ANALYTICS

Authors

DOI:

https://doi.org/10.31891/2307-5732-2025-355-93

Keywords:

parallel computing, big data analytics, task scheduling, high-performance computing, data pipeline optimization

Abstract

The rapid growth of data volumes in science, business, and digital infrastructure has significantly increased the importance of big data analytics as a driving force for analysis, decision-making, and innovation. Given the inherent characteristics of big data - volume, velocity, and variety (the “3Vs”), traditional sequential computing models no longer meet current challenges. As a result, parallel computing has become critically important, offering high performance, scalability, and energy efficiency in data processing.

This article compares three main parallel computing strategies: CPU-based systems, GPU architectures, and distributed computing. Each has its advantages and limitations. Central processing units (CPUs) provide multithreaded execution for general-purpose tasks but are limited by the number of cores and memory bandwidth. Graphics processing units (GPUs) offer massive parallelism suited for intensive computations but face memory constraints and data transfer overhead. Distributed systems (such as Apache Spark, Dask, and Ray) enable horizontal scaling and provide elasticity and fault tolerance, although they require complex inter-node coordination.

The paper analyzes the effectiveness of these paradigms in real-world scenarios, relying on benchmarks for Spark and Dask, the use of GPU acceleration in analytical frameworks, and hybrid models combining MPI and OpenACC. It evaluates computing performance across different contexts, helping to determine the suitability of each approach depending on the task.

The novelty of this study lies in the proposed concept of a hybrid framework that integrates all three strategies into a unified multi-level architecture. CPUs are used for orchestration and lightweight tasks, GPUs for parallel processing of intensive workloads, and distributed systems for scalable processing of large data volumes. This approach significantly enhances resource utilization efficiency and overall system performance.

Published

2025-08-28

How to Cite

TRISKA, R., & HENTOSH, L. (2025). HYBRID APPROACH TO PARALLEL COMPUTING FOR EFFICIENT BIG DATA ANALYTICS. Herald of Khmelnytskyi National University. Technical Sciences, 355(4), 654-659. https://doi.org/10.31891/2307-5732-2025-355-93