Evaluating Fault Tolerance in Distributed Systems using Predictive Analytics with Gated Recurrent Unit and Long Short-Term Memory Models

Faizan Ahmad

doi:10.52783/jisem.v10i27s.4421

PDF

Published: Mar 29, 2025

DOI: https://doi.org/10.52783/jisem.v10i27s.4421

Keywords:

Fault tolerance, distributed system, deep learning, system reliability, Gated Recurrent Unit, Long Short-Term Memory

Faizan Ahmad, Mohd Haroon, Zeeshan Ali Siddiqui

Abstract

Fault tolerance is crucial for ensuring reliability in distributed systems, where minor disruptions can cascade into significant failures, causing downtimes, productivity loss, and financial damage. The complexity and interdependencies of distributed systems make them particularly prone to faults. Designing robust fault-tolerant mechanisms is therefore essential to cater the reliability demands of modern systems. Predictive analytics has become a game-changing approach, transitioning from managing faults reactively to detecting and preventing them proactively. This study examines the integration of Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM), into predictive analytics frameworks to enhance fault tolerance in distributed systems. GRUs efficiently process sequential data, whereas LSTMs are particularly adept at capturing long-term dependencies, making them well-suited for analyzing historical fault patterns. The proposed approach leverages these models to identify critical failure indicators and predict faults with high accuracy. By enabling early detection and response to potential failures, the models prevent disruptions from escalating. Experimental results demonstrate that GRU and LSTM-based models significantly reduce unexpected downtimes through precise fault predictions. Real-time monitoring capabilities further enhance decision-making and preemptive fault-handling processes, ensuring system reliability and performance. This study highlights the practical application of GRU and LSTM models in advancing fault tolerance in distributed environments. By offering a data-driven solution, the research improves fault prediction accuracy, strengthens system resilience, and enhances operational efficiency, addressing key challenges in distributed system management.

Issue

Vol. 10 No. 27s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Evaluating Fault Tolerance in Distributed Systems using Predictive Analytics with Gated Recurrent Unit and Long Short-Term Memory Models

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details