METHOD FOR INTERPRETING THE RESULTS OF CYBERBULLYING DETECTION IN TEXTUAL CONTENT BY MEANS OF ARTIFICIAL INTELLIGENCE
DOI:
https://doi.org/10.31891/2307-5732-2024-343-6-45Keywords:
cyberbullying, neural networks, interpretation of results, BERT, LIMEAbstract
The article proposes the method for interpreting the results of cyberbullying detection in textual content by means of artificial intelligence, which is intended to explain the decisions of the neural network model regarding the types of cyberbullying identified in the textual content. The method is original in that it interprets the results for each detected type of cyberbullying separately, which is achieved by using a multi-label classifier of a transformer neural network architecture and an interpretation model of a machine learning model. By using the trained BERT neural network model for multi-label classification of cyberbullying types in the input text sample, different types of cyberbullying are detected with the percentage of each of them. According to the developed method, an approach based on the use of a machine learning model for the local interpretability of LIME models is used for the visual interpretation of the results of cyberbullying detection, which allows you to visualize the impact of the use of individual words on the model's decision regarding whether the text belongs to different types of cyberbullying.
The developed method provides three views of the interpretation of the results of cyberbullying detection: interpretation of the results according to the color palette, interpretation of the results according to the diagrams of the local importance of words, interpretation of the results according to the diagrams of the general importance of the words. The interpretation of the results according to the color palette consists in using the absolute value of the weights to determine the brightness of the color, where the brightest color indicates the greatest influence of the word on the decision made by the model, and the least bright color indicates the smallest influence, regardless of whether this influence was positive or negative. The interpretation of the results based on the diagrams of the local importance of words is provided by constructing diagrams of the influence of individual words of the text on the probability of assigning this text to a specific type of cyberbullying, which allows you to see how the model evaluates the weight of each word in the text depending on its contribution to the model's decision. The interpretation of the results from the charts of the overall importance of words is provided by forming a set of 10 words that the model considers important regardless of the specific type of cyberbullying.
The results of the experiments showed that the created method provides interpretation of decisions regarding the results of neural network detection of cyberbullying at a level sufficient for a person to understand the features of the text, which resulted in decision-making by artificial intelligence regarding the detection of types of cyberbullying.