DATALOGIC MODEL FOR IDENTIFYING GENDER BY SVM-ANALYSIS OF INTERNET POSTS USING OBJECT-ORIENTED PROJECTING
DOI:
https://doi.org/10.31891/2307-5732-2024-337-3-29Keywords:
gender identity, internet posts, tweets, object-oriented model, data logic model, cisgender, tolerant environment, gender, SVMAbstract
The article proposes the practical approach to gender identification based on the analysis of Internet posts using SVM classifiers. The input data of the approach is a set of SVM classifiers trained on English-language data and the corresponding vectorizers used during training and the post of social Internet networks for analysis. The zero step is to select the gender species to analyze and load the corresponding SVM classifier model with vectorizer. The first step is to pre-process the social media post, which includes removing stop characters and stop words, as well as checking the writing language. If the language of the post is not English, there is an automated translation into English. The next step is the vectorization of the pre-processed post of social Internet networks, after which the step of classification by a trained SVM classifier takes place. The fourth step is the formation of conclusions to the user in the original language of the post of social Internet networks, since the classifier itself works with English-language data. The initial data of approach is the assessment of the post's belonging to the specified gender for analysis.
To investigate the effectiveness of the proposed method, an object-oriented software implementation was created in the PyCharm programming environment, and data logic modeling of the data structure was also performed. The developed approach showed high efficiency, compared to the existing analogue, its accuracy is higher by 0.11. The advantage of the method is the ability to work with short texts, such as tweets, without losing accuracy. The obtained results can be relevant for a variety of applications, including marketing research, public opinion analysis, personalized advertising, political research, and contribute to the creation of safe and tolerant web environments.