Mitigation and Detection of Faulty Nodes in Multinode Hadoop Cluster using Hybrid Machine Learning Techniques

Main Article Content

Atul V. Dusane, Chaitali G. Patil, Shivganga. C. Maindargi, Suman Kumar Swarnkar, Dattatray G Takale

Abstract

In today's modern significant computational systems, jobs are broken down into several smaller processes that run simultaneously to increase the rate at which jobs are completed and lower the amount of energy that is consumed. However, dealing with straggler processes, which are sluggish running processes that raise the total response time, is a typical performance challenge in these kinds of systems. These kinds of jobs have the potential to have a substantial effect on the Quality of Service (QoS) provided by the system. It is necessary to have automatic straggler identification and mitigation systems that can complete jobs in a shorter amount of time in order to address this problem. Previous work often constructs reactive frameworks, the central emphasis of which is, in order, the identification, followed by the mitigation, of straggler tasks, that ultimately results in delays. Other research make use of prediction-based proactive systems, however they disregard the peculiarities of heterogeneous hosts or dynamic tasks. In this article, Hybrid Machine Learning (HML) is offered as a method that may determine which jobs are likely to be behind schedule and dynamically adjust scheduling in order to obtain faster response times. The method that has been suggested examines all tasks as well as hosts on the basis of the use of compute and network resources, and it is also able to predict and mitigate the effects of expected straggler activities. This speeds up the execution without lowering the quality of service. The proposed HML is evaluated in terms of quality of service factors such as energy usage, processing time, resource contention, and CPU utilisation in comparison to other machine learning methods that already exist, including Support Vector Machine (SVM), ADABOOST, Artificial Neural Network (ANN), Naive Bayes (NB), Decision Tree (DT), and Random Forest (RF). According to the results of several evaluations, the proposed HML cuts down on processing time, resource contention, and energy usage by 13.5%, 11.25%, and 16.75%, correspondingly, when compared to standard machine learning methodologies. The proposed HML has a performance accuracy of 98.1%, making it superior to those other conventional ML methods.

Article Details

Section
Articles