ПЕРСПЕКТИВИ ВИКОРИСТАННЯ ТА ОЦІНКА ШВИДКОДІЇ БІБЛІОТЕКИ MLPACK  В ЗАДАЧАХ ОБРОБКИ ДАНИХ ДИСТАНЦІЙНОГО ЗОНДУВАННЯ ЗЕМЛІ

ІГОР ГАРКУША; ДЕНИС ІВАНОВ

doi:10.31891/2307-5732-2025-359-13

Authors

IGOR GARKUSHA Dnipro University of Technology Author https://orcid.org/0000-0003-1190-1501
DENYS IVANOV Dnipro University of Technology Author https://orcid.org/0000-0001-8660-0928

DOI:

https://doi.org/10.31891/2307-5732-2025-359-13

Keywords:

mlpack , scikit-learn, Sentinel-2, Random Forest, machine learning, supervised classification

Abstract

The article reviews the implementation and comparison of the performance of certain machine learning algorithms by modern libraries – mlpack and scikit-learn. The k-nearest neighbor (k-NN) search algorithm and supervised classification using the Random Forest algorithm are considered. As an example of a dataset for comparing the k-NN search algorithm, the Covertype dataset from the well-known repository of the University of California, Irvine was chosen. The dataset for comparing the performance of classification based on the Random Forest algorithm was created on the basis of multispectral multiband image data from the Sentinel-2A Earth remote sensing device. Certain features of using mlpack and another Armadillo library, which mlpack uses, are presented. In particular, the features of loading formatted data by the functions of this library are described. The paper considers the process of preparing a dataset for training a classifier. The generalized steps of processing datasets for training are presented, a list of land cover types of the survey area is given, and solutions are proposed to improve data resolution in feature space. The article presents the classification accuracy with the Random Forest algorithm with certain model hyperparameters and gives an estimate of the time for training and classification of the test dataset. The data for testing amounted to 20% of the total amount of data of the dataset formed according to certain features. In the process of creating the dataset, calculations of vegetation (multispectral) indices were involved, in particular: normalized difference vegetation index (NDVI), normalized difference water index (NDWI), inverse Red-Green index (simple ratio G/R), as well as the principal components PC1 and PC2, which were calculated using the Principal Component Analysis. The results of the study are the classification time in programs created in the C++ and Python programming languages and using the mlpack and scikit-learn libraries, respectively.

PROSPECTS OF USE AND THE SPEED ASSESSMENT OF THE MLPACK LIBRARY IN REMOTE SENSING OF THE EARTH DATA PROCESSING TASKS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Language

Make a Submission

Index

Flag