Dynamic Fault Tolerance and Performance Optimization in Grid Computing Using Unified Checkpointing and Replica Management

Ravi Kant Verma

doi:10.52783/jisem.v10i27s.4655

PDF

Published: Mar 27, 2025

DOI: https://doi.org/10.52783/jisem.v10i27s.4655

Keywords:

dynamic fault tolerance mechanism Unified Checkpoint.

Ravi Kant Verma

Abstract

This paper presents a novel dynamic fault tolerance mechanism for grid computing, utilizing the Unified Checkpoint ing technique combined with task replication and replica management to enhance performance and reliability in distributed, heterogeneous environments. The proposed method addresses challenges inherent in opportunistic grid environments, such as machine failures, network partitions, and resource availability fluctuations. By dynamically adjusting the Number of replicas, monitoring resource status, and utilizing the most advanced replica’s checkpoint for recovery, the system minimizes downtime and optimizes task execution time. Experimental results, based on simulations using the GridSim toolkit, demonstrate a significant reduction in task execution time (up to 47% improvement) when compared to traditional approaches. The research highlights the potential of this approach in improving the performance of long-running tasks, especially in unpredictable computing environments such as student laboratories or other resource-constrained settings. Additionally, ongoing work focuses on adaptive feedback mechanisms to further optimize replica management and check pointing strategies based on environmental factors.

Issue

Vol. 10 No. 27s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Dynamic Fault Tolerance and Performance Optimization in Grid Computing Using Unified Checkpointing and Replica Management

Abstract

Volume 11 (2026)

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details