METHOD OF AUTOMATED DETECTION OF ARTICLE TERMS USING A DECISION TREE

Authors

DOI:

https://doi.org/10.31891/2307-5732-2023-319-1-338-343

Keywords:

virtual community, decision tree, IT industry, big data processing, analysis of the content of posts

Abstract

Every day, the number of users of virtual communities is increasing, and therefore the data that occurs during communication between them. The posted data can contain valuable information because they contain not only the manufacturer's opinion, but also consumer experience about a certain product. But, due to the fact that virtual communities have a weak structure in terms of providing information, they are more focused on entertaining content - they may contain data that do not carry a meaningful load, and also, when placing data, not all users foresee techniques that will help increase the relevance of the search for this data. Therefore, the search for target data requires significant time costs. To improve the search for data in the article, a method is proposed that allows you to analyze the content of posted posts and identify keywords from a certain subject area. This method is automated and works on the basis of a previously developed dictionary of key phrases or regular expressions with weighting coefficients of belonging to one or another term. As a result, a decision-making tree is built for each term, which determines the weight of the term to the content of the post, article.

At the same time, the level of location of the post in the discussion is taken into account, because the discussion contains a set of chronologically ordered posts. Posts placed at higher levels have a higher coefficient in the calculation. While posts are placed at lower levels - lower weighting factors. Identified key phrases before the specified term are ordered in descending order of weight. At each level of the tree, the total weight of key phrases must be equal to one. To process the data from the virtual communities, they were downloaded using the data consolidation technique. As a result, the concept of consolidated data storage was introduced, which allows collecting data from disparate sources. The paper presents the weight calculation for one term from part of the CodeProject community post.

Published

2023-04-27

How to Cite

SYNKO, A., & ZHEZHNYCH, P. (2023). METHOD OF AUTOMATED DETECTION OF ARTICLE TERMS USING A DECISION TREE. Herald of Khmelnytskyi National University. Technical Sciences, 319(2), 338-343. https://doi.org/10.31891/2307-5732-2023-319-1-338-343