METHOD FOR DECOMPOSING MONOLITHIC INFORMATION SYSTEM ARCHITECTURES BASED ON GRAPH NEURAL NETWORK CLUSTERING
DOI:
https://doi.org/10.31891/Keywords:
software architecture, microservices, monolith decomposition, graph neural networks, clusteringAbstract
The study introduces a method for decomposing monolithic information system architectures through clustering of node embeddings obtained from Graph Neural Networks (GNNs). The software is represented as a directed graph of code entities, where nodes correspond to business classes and edges capture import or invocation dependencies. A two-layer GraphSAGE model with mean aggregation generates embeddings that combine structural and contextual features. The resulting representations are clustered using k-means to delineate potential microservice boundaries.
The method was evaluated on an open-source e-commerce monolith written in C#. The codebase comprises over 190 classes across five business domains: Products, Orders, Customers, Inventory, and Categories. Compared with a baseline that applies k-means to raw CodeBERT embeddings, the proposed approach achieved higher clustering quality, reaching a Silhouette score of 0.69 versus 0.24. Two additional metrics confirmed the results: Normalized Mutual Information (NMI = 0.74), reflecting the similarity between detected clusters and reference domains, and Adjusted Rand Index (ARI = 0.68), accounting for random agreement (averaged over ten independent runs). Both indicate stable and consistent alignment with functional areas of the system.
Analysis of the resulting clusters shows that GNN-based embeddings capture not only syntactic but also semantic relations among program entities, producing interpretable partitions that correspond to domain modules. The method serves as a reproducible, data-driven aid for architectural refactoring and supports the transition of legacy monolithic systems toward microservice architectures. Future research aims to include dynamic call-graph data, version-history information, and community-detection algorithms to automate cluster number selection and improve scalability for large-scale projects.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 МАРКІЯН ШЕСТАКОВИЧ, ЮРІЙ ШАБАТУРА (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.