ENERGY EFFICIENCY AND TOTAL COST OF OWNERSHIP OF MULTI-LAYER DATA STORES
DOI:
https://doi.org/10.31891/2307-5732-2025-355-25Keywords:
Lakehouse, energy footprint, TCO, Delta Lake, MilvusAbstract
Enterprises now expect the same data platform to serve business-intelligence SQL, relationship analytics and large-language-model–driven semantic search. The practical response is a poly-store that combines a Lakehouse core with graph and vector indexes. Although performance benefits are well documented, quantitative evidence of operational footprint – energy demand, carbon emissions and total cost of ownership (TCO) – is scarce.
This paper presents a thirty-day, twelve-hours-per-day benchmark that compares an NVMe-backed ClickHouse cluster with a three-layer prototype (Delta Lake + Neo4j + Milvus) deployed on Microsoft Azure and Amazon Web Services. The workload blends 40 % TPC-DS OLAP queries, 30 % LDBC graph traversals, 20 % ANN-Bench vector searches and a 10 % change-data-capture (CDC) ingest stream. For every 100 000 successful queries were recorded watt-hours via the providers’ Energy/Emissions APIs, dollars at April-2025 list prices and a sustainability-adjusted TCO (TCO-S) that monetises CO₂-equivalent emissions at 80 $ t⁻¹.
Under steady load, the poly-store burns around 34 % less electricity and lowers TCO-S by around 27 % thanks to serverless compute de-allocation, specialised query engines and 2.7× columnar compression. A 30-minute CDC surge that quadruples ingest rate doubles both metrics unless tiered SSD caching and simple back-pressure are activated; these mitigations cap the spike at +38 % energy and +31 % cost. Migrating only the object-storage bucket from a high-carbon (around 230 g CO₂e kWh⁻¹) to a low-carbon (around 25 g) region trims TCO-S by a further 11 % without breaching a 100 ms latency budget.
The contribution is threefold: the first cloud-native dataset that unites relational, graph and vector modalities with energy metrics, the one-number TCO-S indicator that fuses financial and ESG perspectives and a reproducible experimental setup demonstrating consistent results with minimal variance (≤ 5 % variance). Findings recommend Lakehouse poly-stores for everyday analytics, advise SSD caching for bursty ETL and highlight geography as a low-hanging optimization lever for carbon-aware data platforms.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 БОГДАН ЗУБАЛЬ (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.