15–16 Nov 2025
Indian Institute of Science
Asia/Kolkata timezone

Scaling AI Infra: Data Pipelines, Orchestration, and Distributed Training with Ray

15 Nov 2025, 14:35
50m
Indian Institute of Science

Indian Institute of Science

Bengaluru, India
Workshop (60 mins) Artificial Intelligence & Machine Learning (AI/ML)

Speaker

Prarabdha Srivastava
IISc

Description

Modern AI systems are no longer bottlenecked by models—they are bottlenecked by infrastructure. Training and deploying state-of-the-art models requires managing terabytes of multimodal data, orchestrating distributed GPU clusters, and ensuring reproducibility, data consistency & fault tolerance. The difference between a successful AI project and an abandoned prototype often comes down to the invisible layer of infrastructure: how data is stored, streamed, preprocessed, and served for training and inference.

In this talk, We will unpack why building robust AI infrastructure has become the most important problem in both academia and industry. We will explore many open-source tools can level the playing field, enabling even small teams—whether working in research or building products to handle data and computation at scale with far less overhead. I will introduce Ray, an emerging distributed computing framework, and demonstrate how it simplifies complex workflows—scaling from a laptop to multi-GPU clusters, streaming petabyte-scale datasets and orchestrating training/inference pipelines without the additional complexity.

Crux of this workshop:

  1. A clear understanding of the design trade-offs in large-scale AI
    infra (storage formats, ingestion, orchestration, inference).

  2. A practical guide to using Ray, vLLM, KubeRay, and related tools on Ubuntu from distributed training and dataset versioning in
    academic research to building scalable pipelines and robust model
    serving in industrial deployments.

  3. Common pitfalls & how to avoid them to build resilient AI infrastructure.

Presentation materials