STUDY OF THE ALGORITHM FOR BUILDING A MODEL OF SENTIMENT ANALYSIS OF MESSAGES IN SOCIAL NETWORKS

Authors

DOI:

https://doi.org/10.31891/2307-5732-2024-333-2-66

Keywords:

social media message analysis, machine learning, logistic regression, sentiment analysis

Abstract

In modern conditions, with the need to constantly monitor public sentiment, analyzing the tone of posts and comments to them makes it possible to determine whether users like a product, a bank can find out the assessment of the quality of their services from customers' feedbacks, election candidates can investigate which of them will receive more votes, etc.

This article is devoted to the problem of developing of a sentiment analysis algorithm for messages from social networks and its practical implementation using Python tools. Additionally, the classification of tools for conducting sentiment analysis of messages is disclosed. It is noted that the most effective tools are those based on dictionaries and rules, machine learning tools, and manual processing. Special attention is given to the relevant online services that perform such tasks, and a brief description of them is provided.

The data were provided by the YouScan service. This service is capable not only of collecting the necessary information for analysis but also of analyzing texts in Ukrainian.

However, in the context of a full-scale invasion and the aggressor's attempts to interfere in all spheres of life, such information should be confidential. Therefore, the possibility of information leakage should be minimized. In these circumstances, machine learning tools capable of operating on local resources should be used. The use of manual analysis is also possible, but it is not cost-effective.

Particular attention was paid to preparing data for the model, namely: cleaning messages from unnecessary characters, pictures, punctuation marks, emojis, etc.; tokenizing of the text; and stemming of the resulting vectors. In this paper, the Pipeline model with logistic regression was used as the main machine learning tool for solving the problem.

The effectiveness of the model built in this way was tested on test data. Metrics for its evaluation were calculated, namely Precision and Recall. As a result, it was found that this model evaluates a positive comment as a negative one in 22% of cases. To eliminate this drawback, it was proposed to increase the threshold for determining of a positive assessment from 0.5 to 0.67.

Published

2024-04-25

How to Cite

STUDY OF THE ALGORITHM FOR BUILDING A MODEL OF SENTIMENT ANALYSIS OF MESSAGES IN SOCIAL NETWORKS. (2024). Herald of Khmelnytskyi National University. Technical Sciences, 333(2), 421-427. https://doi.org/10.31891/2307-5732-2024-333-2-66