Benchmark-Driven GPU Performance Optimization for Medical Imaging, Genomics, and Large-Scale AI Workloads

Rakesh Challa

PDF

Published: Nov 19, 2024

Keywords:

GPU Acceleration, High-Performance Computing (HPC), Benchmarking, HPL Tuning, NCCL Optimization.

Rakesh Challa

Abstract

This paper illustrates how benchmarking is useful in achieving optimization of the workloads that can be accelerated using GPUs in clinical imaging, genomics studies, and generative AI training. We tested High-Performance Linpack (HPL) tuning, memory throughput optimization, NCCL communication optimization and GPU health validation to clusters of multiple GPUs. Peak floating-point performance of 12.3 TFLOPS to 34.7 TFLOPS was attained in various GPUs. Memory optimizations boosted performance in effective bandwidth up to 1.8-2.2x. In distributed AI workloads NCCL optimization helped to cut communication latency by 35-42, and memory virtualization trained large models, including VGG-16 (batch size 256), with only 18 percent loss in performance on 12 GB of a GPU. In medical imaging, when 2.1 -3.3 times less time was spent on reconstruction, there was no quality loss, and this was due to the use of the GPU. Genomics processes were almost 166X faster in identifying microRNAs than on a CPU. These findings demonstrate that the optimizations through benchmarking can lead to a reduction in the time-to-diagnosis, training, and cluster utilization in healthcare and AI.

Issue

Vol. 10 No. 1 (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Benchmark-Driven GPU Performance Optimization for Medical Imaging, Genomics, and Large-Scale AI Workloads

Abstract

Volume 11 (2026)

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details