INTELLIGENT SALES FORECASTING SYSTEM BASED ON ENSEMBLE MACHINE LEARNING MODELS AND SEMANTIC INTEGRATION OF CRM AND ERP DATA

Authors

DOI:

https://doi.org/10.31891/2307-5732-2026-365-91

Keywords:

sales forecasting, machine learning, semantic integration, CRM systems, ERP systems, ensemble models

Abstract

This paper addresses the challenge of enhancing sales forecasting accuracy amidst market uncertainty through the semantic integration of CRM and ERP system data. The primary objective of this study is to design the architecture for and implement an intelligent information system utilizing hybrid ensemble machine learning models. Given high market volatility and the frequent siloing of enterprise information systems, traditional demand forecasting methods—relying solely on transactional history—exhibit poor performance, particularly for niche "long-tail" products. Generating forecasts without factoring in customer behavioral indicators and inventory constraints creates a disconnect between commercial planning and production capacity. The IDEF0 methodology was employed to model the business processes of the proposed system, facilitating a detailed breakdown of the data collection and feature engineering stages. The software architecture is based on the Data Lakehouse paradigm, incorporating a dedicated Feature Store module. The predictive core relies on a hybrid weighted ensemble of machine learning models: ARIMA is used to stabilize temporal trends, XGBoost processes non-linear signals originating from the CRM, and LSTM detects long-term patterns. Model training and validation were executed using a dataset comprising over 4,000 active SKUs spanning an 18-month period. The scientific novelty lies in the proposed mechanism for the dynamic allocation of weights among the base ensemble models, which are distributed inversely proportional to their local error within a sliding validation window. Results demonstrate that the ensemble model outperforms classical approaches, successfully reducing the symmetric Mean Absolute Percentage Error (sMAPE) to 12.9%. Experimental findings confirm that deploying complex machine learning models is optimal when historical data spanning 12 to 18 months is available. Furthermore, the study identifies a critical correlation between forecasting accuracy and the update frequency of input features: delaying CRM data processing by more than seven days results in a sharp escalation of the error rate. The practical implementation of these research findings minimizes human factor interference in Sales and Operations Planning (S&OP) processes and reduces out-of-stock occurrences by 9%.

Published

2026-05-28

How to Cite

SVYSHCH, D., & BASYUK, T. (2026). INTELLIGENT SALES FORECASTING SYSTEM BASED ON ENSEMBLE MACHINE LEARNING MODELS AND SEMANTIC INTEGRATION OF CRM AND ERP DATA. Herald of Khmelnytskyi National University. Technical Sciences, 365(3), 649-655. https://doi.org/10.31891/2307-5732-2026-365-91