CXL-Aware Resource Orchestration for Disaggregated AI Server Platforms
Main Article Content
Abstract
The modern AI applications needs a high rate of computing and versatile memory criteria. Monolithic servers in the past are usually not utilised effectively resulting into inefficiencies. The paper suggests a CXL-aware resource orchestration system on disaggregated AI server platforms, where workload drive causes the on-demand allocation of compute, memory and accelerator resources. The architecture combines memory which is CXL-enabled in order to extend physical memory to between servers without compromising on low latency and workload aware scheduling in order to reduce execution time and network overload. The AI training workloads, big data, and graph processing workloads were experimented using simulation and prototyping. Findings indicate that the developed system decreases the overall workload makespan by 15-18 percent over the baseline systems and the effective memory access latency goes up by 10-15 percent. The utilization of resources of more than 75% continues to be the compute, memory, and accelerator nodes and QoS violations under 5% continue to occur under scalable workloads. Network-conscious scheduling helps reduce the effects of congestion and execution time remains near optimal even in high network utilization. The findings imply that the disaggregated process of integrating memory together with clever orchestration can spur a considerable degree of performance, efficiency, and scalability of AI data centers.