Machine learning (ML) workloads demand high computational efficiency, particularly when leveraging GPUs for training. However, bottlenecks such as data loading, serialization, and resource contention often hinder performance. This article explores how Kubernetes-powered in-memory data caching, combined with distributed computing frameworks, can optimize ML workflows by reducing GPU idle time, minimizing CPU overhead, and improving scalability. The solution integrates Iceberg tables, Apache Arrow, and the CNCF ecosystem to deliver a robust, production-ready architecture for large-scale ML training.