Optimizing Cost-Effective Cloud Data Pipeline Orchestration across Multiple Cloud Providers
Main Article Content
Abstract
Multi-cloud data systems are flexible and can handle performance advantages, but create serious challenges in controlling variable cost of execution and performance assurances. The proposed research is centered around an independent, cost-conscious orchestration system that dynamically redirects data pipeline workloads between cloud providers to ensure cost reduction without violating an SLA. Execution cost, latency, and SLA satisfaction are predicted with the help of machine learning models and allow for informed orchestration decisions. Experimental discussion with realistic cloud workload data shows that ensemble-based models, especially Random Forest, are more effective than linear ones in terms of the characteristics of the complex cost-performance tradeoffs. The findings highlight the efficiency of adaptive, learning, orchestration in the enhancement of efficiency as compared to the efficiency of fixed, multi-cloud scheduling methods.