DUAL-LEVEL STRATEGY FOR ENHANCING RTOS FAULT TOLERANCE USING PROBABILISTIC ANALYSIS
DOI:
https://doi.org/10.31891/2307-5732-2025-353-60Keywords:
operating systems, probabilistic analysis, watchdog timer, fault tolerance, soft reset, cyber-physical systems, reliability, real-time systemsAbstract
The paper proposes a two-level fault-tolerance model for real-time operating systems that combines a hardware watchdog timer with a software module implementing probabilistic monitoring and proactive component recovery. Unlike classical approaches, which mostly react only after critical failures occur, the developed mechanism focuses on early detection of system performance degradation. Failure risk assessment is performed based on probabilistic calculations that take into account task response times, message queue occupancy, stability of heartbeat signals signals, and other indicators. When potentially dangerous deviations are detected, the system initiates a local restart of individual tasks or drivers before the state approaches a critical level. This enables timely stabilization of operation, prevents error accumulation, reduces the load on the microcontroller, and significantly decreases the number of full system reboots.
Integration of the developed model with popular real-time operating systems, such as FreeRTOS, simplifies its implementation in existing embedded solutions and ensures high compatibility with hardware platforms from various manufacturers. The proposed strategy is particularly effective for systems with limited computing resources, such as autonomous robotic complexes, industrial controllers, unmanned aerial vehicles, and Internet of Things devices. Experimental studies have shown that applying the two-level model can reduce the system’s average downtime, decrease the number of global restarts severalfold, and improve overall operational reliability without significantly increasing CPU and memory usage. The obtained results confirm the feasibility of using the proposed approach as a flexible and efficient means of enhancing the fault tolerance of modern cyber-physical systems.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 ОЛЕКСАНДР КОЗЕЛЬСЬКИЙ, БОГДАН САВЕНКО, ОЛЕГ САВЕНКО (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.