INVESTIGATING THE EFFECT OF DATA DISTRIBUTION DRIFT LEVEL ON REINFORCEMENT LEARNING AGENTS
DOI:
https://doi.org/10.31891/2307-5732-2023-329-6-286-290Keywords:
reinforcement learning, data drift, time series, machine learningAbstract
For the regression and classification tasks, as discussed in the first chapter, new approaches to circumvent the problem of data drift are actively being researched. However, for reinforcement learning, the question of how much data drift affects the quality of actions is still open. SAC (Soft Actor-Critic) - an RL agent that is trained to solve the problem of voice quality parameters optimization was chosen for the topic research. This is a state-of-the-art reinforcement learning algorithm designed to solve problems with a continuous action space. SAC is an extension of the original Actor-Critic architecture and includes several key improvements to enhance learning stability and sampling efficiency. The SAC agent was trained on GSM network statistical data received from base stations (BS). In total, data that were used for training was collected from 1,044 base stations for a total of 2.5 years. The statistics are averaged daily values of the network characteristics at the base station. The agent makes decisions based on the current values of the following parameters: HR Usage Rate, HR Usage Rate, TCH Blocking Rate, TCH Traffic, Number of Available TCH, and two thresholds for setting the use of FR (Full Rate). The agent target is to set those thresholds so that blocking does not occur and use BS resources as efficiently as possible. A system was built to analyze reinforcement learning agents. The activity diagram is given in the second section for a system collecting agent quality data for diverse data sets with different drift levels. The system has two implementation options, depending on whether the data known to the agent is used or not. For the case when the quality of the agent is evaluated on the training data, there are two implementation options: drift estimation and collection of SHAP analysis of the agent. In the variant of SHAP collection of values for the dataset, the agent is loaded and approximated to the regression model to find the contribution of each feature to the agent decision. In the course of performing this evaluation, data about the quality of the agent in the areas known to the agent is collected. After calculating all SHAP values on all episodes, these values are aggregated into a list of data features and their total absolute impact on the outcome of the agent's actions. This data will later be used to calculate the weighted data drift. After collecting data about the quality of the agent on the data it knows, this data is aggregated by episodes and stored for further analysis. A second use of the system for agent training data is to calculate agent quality with data drift accounting. The procedure will be identical to the case with new data from the moment the dataset is split into episodes, that it will be described in more detail in the variant for the data with drift in the second section.