ENERGY EFFICIENCY AND TOTAL COST OF OWNERSHIP  OF MULTI-LAYER DATA STORES

Authors

DOI:

https://doi.org/10.31891/2307-5732-2025-355-25

Keywords:

Lakehouse, energy footprint, TCO, Delta Lake, Milvus

Abstract

Enterprises now expect the same data platform to serve business-intelligence SQL, relationship analytics and large-language-model–driven semantic search. The practical response is a poly-store that combines a Lakehouse core with graph and vector indexes. Although performance benefits are well documented, quantitative evidence of operational footprint – energy demand, carbon emissions and total cost of ownership (TCO) – is scarce.

This paper presents a thirty-day, twelve-hours-per-day benchmark that compares an NVMe-backed ClickHouse cluster with a three-layer prototype (Delta Lake + Neo4j + Milvus) deployed on Microsoft Azure and Amazon Web Services. The workload blends 40 % TPC-DS OLAP queries, 30 % LDBC graph traversals, 20 % ANN-Bench vector searches and a 10 % change-data-capture (CDC) ingest stream. For every 100 000 successful queries were recorded watt-hours via the providers’ Energy/Emissions APIs, dollars at April-2025 list prices and a sustainability-adjusted TCO (TCO-S) that monetises CO₂-equivalent emissions at 80 $ t⁻¹.

Under steady load, the poly-store burns around 34 % less electricity and lowers TCO-S by around 27 % thanks to serverless compute de-allocation, specialised query engines and 2.7× columnar compression. A 30-minute CDC surge that quadruples ingest rate doubles both metrics unless tiered SSD caching and simple back-pressure are activated; these mitigations cap the spike at +38 % energy and +31 % cost. Migrating only the object-storage bucket from a high-carbon (around 230 g CO₂e kWh⁻¹) to a low-carbon (around 25 g) region trims TCO-S by a further 11 % without breaching a 100 ms latency budget.

The contribution is threefold: the first cloud-native dataset that unites relational, graph and vector modalities with energy metrics, the one-number TCO-S indicator that fuses financial and ESG perspectives and  a reproducible experimental setup demonstrating consistent results with minimal variance (≤ 5 % variance). Findings recommend Lakehouse poly-stores for everyday analytics, advise SSD caching for bursty ETL and highlight geography as a low-hanging optimization lever for carbon-aware data platforms.

Downloads

Published

2025-08-28

How to Cite

ZUBAL, B. (2025). ENERGY EFFICIENCY AND TOTAL COST OF OWNERSHIP  OF MULTI-LAYER DATA STORES. Herald of Khmelnytskyi National University. Technical Sciences, 355(4), 172-176. https://doi.org/10.31891/2307-5732-2025-355-25