EXPERIMENTS AND USED EVALUATION METRICS USED IN THE DEVELOPMENT OF A LANGUAGE-INDEPENDENT INCREASE DETECTOR

Authors

DOI:

https://doi.org/10.31891/2307-5732-2022-309-3-44-49

Keywords:

language independent detector, incremental approach, locally sensitive hashing, experiment, estimation metrics

Abstract

Experiments and evaluation metrics play an important role in the development of a language-independent incremental detector (MRIP), which will allow to analyze the results of the development and the suitability of the developed algorithm and device. The experiments will also provide an answer to the question of evaluating the performance of the developed detector and comparing it with the commercial SIG approach to clone detection, to explore the benefits that the incremental approach can offer. To get an idea of the performance of MNIDP, it is proposed to run it for five software systems (open source), measuring the requirements for time and memory. Also, to answer the question of expanding and improving the initial approach, by using locally sensitive hashing (LCH), there is a need to measure the performance of the proposed expansion on the basis of LCH and compare it with the performance of MNIP. The experiments conducted in the study provided some useful information based on the evaluation of the effectiveness of the proposed expansion on the basis of LCH. More specifically, in some cases, compared to the implementation of MNIDP, the stage of creating an index in the approach based on LCH was two, and in some cases three times slower. A possible reason for this may be the complexity of the MinHash operation, which is a significant part of the overall LCH scheme. This becomes obvious when you consider that the hashing of each tile for each set of tiles during MinHashing must be performed by k-based hash functions. It was assumed that the process of incremental implementation step on the basis of MNIDP will be much slower, due to the calculation of index records on the fly. However, in the course of the study, opposite results were obtained. In practice, this was justified by the fact that the similarity threshold used did not cause a large number of matches between the source files. To gain a better understanding of its behavior, there is a need for further research into the relationship of runtime required for the incremental implementation step flow based on the LCH and the similarity threshold.

Published

2022-05-26

How to Cite

PRAVORSKA, N., & HRYPYNSKA, N. (2022). EXPERIMENTS AND USED EVALUATION METRICS USED IN THE DEVELOPMENT OF A LANGUAGE-INDEPENDENT INCREASE DETECTOR. Herald of Khmelnytskyi National University. Technical Sciences, 309(3), 44-49. https://doi.org/10.31891/2307-5732-2022-309-3-44-49