MULTI-AGENT DEEP REINFORCEMENT LEARNING FRAMEWORK DESIGN FOR EFFICIENT SINGLE-INTERSECTION TRAFFIC LIGHT CONTROL

MYKHAILO LYTVYNENKO; LEONID REBEZYUK

doi:10.31891/2307-5732-2026-363-59

Authors

MYKHAILO LYTVYNENKO Kharkiv National University of Radio Electronics Author https://orcid.org/0000-0003-4487-8811
LEONID REBEZYUK Kharkiv National University of Radio Electronics Author https://orcid.org/0000-0001-8516-6584

DOI:

https://doi.org/10.31891/2307-5732-2026-363-59

Keywords:

cooperative reinforcement learning , partial observability, uncertainty, decentralized training and execution, traffic light control

Abstract

This paper reformulates single-intersection traffic light control as a cooperative Decentralized Partially Observable Markov Decision Process (Dec-POMDP), treating it as a minimal testbed for studying decentralized coordination under uncertainty rather than as a standalone optimization task. Multiple agents control disjoint signal groups using fine-grained primitive actions, emphasizing modularity, robustness to sensing limitations, and compatibility with legacy stage-based control systems. To enable coordination without explicit communication, we propose an extended observation space that includes both dynamic traffic features and structural intersection information, allowing passive coordination through shared physical signals. Building on this formulation, we introduce a decentralized multi-agent deep reinforcement learning framework that integrates recurrent value estimation to mitigate partial observability, distributional reinforcement learning to preserve multi-modal return structures arising from competing coordination equilibria, and hysteretic updates to stabilize decentralized learning dynamics. Primitive-action traffic signal control induces chain-like decision processes with stochastic outcomes, where naive exploration and mean-based value estimates often lead to premature convergence to suboptimal coordination strategies. The proposed uncertainty-aware framework explicitly addresses this challenge. Preliminary simulation experiments are used to analyze learning dynamics, equilibrium sensitivity, and coordination behavior. Rather than emphasizing performance superiority, the results illustrate the behavioral implications of the proposed reformulation and learning design. This work provides a principled framework for decentralized, uncertainty-aware traffic signal control and establishes a foundation for future extensions to scalable multi-intersection coordination.

MULTI-AGENT DEEP REINFORCEMENT LEARNING FRAMEWORK DESIGN FOR EFFICIENT SINGLE-INTERSECTION TRAFFIC LIGHT CONTROL

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Language

Make a Submission

Index

Flag