EFFICIENCY RESEARCH OF CLUSTER ANALYSIS METHODS FOR DETECTING OUTLIERS IN REAL ESTATE MARKET
DOI:
https://doi.org/10.31891/2307-5732-2025-351-45Keywords:
anomaly detection, outlier detection, real estate, unsupervised learning, data miningAbstract
The work conducted a comprehensive assessment of existing uncontrolled methods for detecting outliers on a real dataset. The work of 23 algorithms built based on 4 types of models was studied: probabilistic, linear, proximity-based and graph-based models. The dataset was prepared based on developed software for real estate agencies. The study covered the real estate market of Ternopil, in particular the sale of apartments and rooms. The prepared dataset contains 760 real estate objects with 12 features. For each real estate object, based on its characteristics, an anomaly label was gradually applied by the expert. This allowed the formation of datasets with an anomaly label in 10, 15, 20 and 25% of the objects. The testing of the algorithms was carried out using two methods of encoding categorical features: Label Encoding and One-Hot Encoding. Standardization of the dataset was carried out using the RobustScaler scaler, which is resistant to outliers. The results of the work were evaluated by three indicators: AUC-ROC, Precision @ Rank n and algorithm execution time. They allowed us to assess the accuracy and efficiency of the algorithms used and determine their suitability for real-world problems of anomaly detection in real estate data. The visualization of the results of the algorithms was carried out using the t-SNE dimensionality reduction method and made it possible to assess how well each model clusters normal and anomalous objects. The work also examines in more detail the stability of the results of nondeterministic anomaly detection algorithms, presents diagrams of the range of their metrics, and describes the possibility of their practical use. In general, this study covers modern models and algorithms for anomaly detection, stages of information processing and analysis, data visualization technologies, contributes to the improvement of methodologies based on machine learning in the real estate sector, supports informed decision-making in the analysis of the real estate market and provides valuable recommendations to stakeholders.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 ОЛЕГ ПАСТУХ, ВІКТОР ХОМИШИН (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.