APPLICATION OF NEURAL NETWORK APPROACHES TO SOLVE THE MULTI-ARMED BANDIT PROBLEM
DOI:
https://doi.org/10.31891/2307-5732-2023-327-5-132-138Keywords:
recurrent neural network, multi-armed bandit, prediction, algorithm effectivenessAbstract
The primary challenge for many individuals is the lack of knowledge on how to take the first step into the realm of investing their finances. People aspiring to delve into investing typically lack guidance on where to begin and which stocks of companies can be lucratively traded. This article conducts an analysis and comparison of eight fundamental algorithms for solving the multi-armed bandit problem. To achieve this, a corresponding research environment was designed and developed, allowing observation of algorithm behavior over a simulated period of seven years. The environment closely resembles real-world conditions, enabling the analysis of agent behavior in the simulation and drawing pertinent conclusions regarding their effectiveness.
A new modification of the greedy agent was created, which, instead of using its own evaluations, utilizes predictions formed by recurrent neural networks. The proposed approach combines the capabilities of artificial intelligence and traditional algorithms to address the multi-armed bandit problem. The effectiveness of each algorithm and the appropriateness of their use in determining investment attractiveness were analyzed. The results of the experiments are presented in a clear and understandable analytical format.
Two best algorithms from each domain were chosen: UCB and the greedy agent, whose evaluations are formed by a recurrent neural network based on GRU. The results of using other algorithms, which do not require prior knowledge of the environment while providing a decent profit, were also analyzed.
The best results were obtained when using UCB and the greedy agent, whose evaluations are formed by a recurrent neural network based on GRU. Although the profit obtained using UCB was three times greater than the profit obtained by the GRU agent, it is worth noting that the probability of the correct selection of the trust parameter in UCB is very low. Therefore, depending on the needs of potential users, one of these approaches can be chosen, keeping in mind the risk of using UCB.