ПОРІВНЯННЯ ЕФЕКТИВНОСТІ RL-АЛГОРИТМІВ ДЛЯ БЕЗПЕЧНОГО ОБХОДУ ПЕРЕШКОД БПЛА

МИХАЙЛО КОПИЛЕЦЬ

doi:10.31891/2307-5732-2025-355-33

Authors

MYKHAILO KOPYLETS Lviv Polytechnic National University Author https://orcid.org/0009-0004-5823-9871

DOI:

https://doi.org/10.31891/2307-5732-2025-355-33

Keywords:

DQN, PPO, UAV, collision avoidance

Abstract

An assessment of advanced decision-making techniques based on deep reinforcement learning has been performed for unmanned aerial vehicles operating in a two-dimensional virtual arena populated by a dense array of stationary obstacles. A custom simulation platform enables each drone to observe only a limited forward sector and to issue one of several discrete commands, each of which encodes both a change in heading and a forward translation step. This design faithfully reproduces the quantized control signals typical of onboard flight computers. Attention was directed toward two leading methods. One approach is built on value-based updates through a deep Q-network structure while the other relies on policy-gradient optimization under the proximal policy optimization paradigm. Both families of agents were trained under identical conditions, after which their capacity to produce collision-free flight paths was evaluated using a collection of qualitative measures. Those measures included the ability to adapt when the arrangement of obstacles changes, the overall smoothness of the flight trajectory, the frequency of abrupt stops and sharp turns, and the consistency of learning progress over time.

Additional analysis examined how different mathematical activation functions, including the widely used rectified linear unit, its leaky variant, as well as hyperbolic tangent and logistic sigmoid, affect the speed at which each agent reaches reliable performance and the properties of the resulting motion patterns. It emerged that the policy-gradient agents generally maintain more uniform parameter updates and exhibit greater stability when navigating complex topologies. In contrast, the value-based agents excel at rapid, reactive maneuvers in narrow pathways, weaving through tightly clustered obstacles with minimal deviation from their intended course.

The overall findings suggest that when the primary objective is to maximize flight safety and minimize collision risk, configurations derived from proximal policy optimization combined with activation schemes that curb excessive gradient growth are most suitable. Conversely, missions that demand swift directional changes in confined environments are better served by the deep Q network model paired with activation functions that allow unrestricted linear growth. These outcomes underscore the need for a holistic selection process when choosing both the learning algorithm and neural network architecture, taking into account environmental complexity, reward structure, and hardware constraints. Future investigations should broaden the scope to three-dimensional scenarios, incorporate real-world sensor data, and validate the proposed methods through hardware-in-the-loop flight trials.

COMPARISON OF THE EFFECTIVENESS OF RL ALGORITHMS FOR SAFE UAV OBSTACLE AVOIDANCE

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Language

Make a Submission

Index

For Avtors

Flag