Oct 25 – 27, 2024
The Hague, Netherlands
Europe/Amsterdam timezone

How TCPX optimizes RDMA, and GPU-Direct performance for AI/ML, and HPC workloads

Oct 26, 2024, 12:00 PM
25m
KWA - Plenary room (World Forum The Hague)

KWA - Plenary room

World Forum The Hague

1000
Show room on map
Talk (25 Minutes) Infrastructure

Speakers

Hassan Tasneem
Hugo Huang

Description

Remote Direct Memory Access is a technology that enables two networked computers to exchange data in main memory without relying on the processor, cache or operating system of either computer. GPUDirect RDMA is a technology that enables a direct path for data exchange between the GPU and a third-party peer device using standard features of PCI Express.

GPUDirect-TCPX is a custom and open-source remote direct memory access (RDMA) networking stack that increases the network performance of your accelerator VMs by allowing data packet payloads to transfer directly from GPU memory to the network interface without having to go through the CPU and system memory.

In the era of AI/ML and HPC, where data-intensive workloads demand optimal performance, maximizing network efficiency is crucial. Traditional TCP often struggles to keep up with the high throughput and low latency requirements of modern compute-intensive applications. This session delves into how GPUDirect-TCPX, a custom and open-source networking stack, revolutionizes network performance by optimizing RDMA (Remote Direct Memory Access) and GPU-Direct communication.

By the end of this session, participants will gain a comprehensive understanding of how TCPX unlocks the full potential of RDMA and GPU-Direct, empowering them to build faster, more efficient networks that meet the growing demands of data-intensive applications.

Session author's bio

Hassan Tasneem is a seasoned technology leader with over 15 years of experience in the cloud industry. He has held various leadership positions at both Amazon Web Services (AWS) and Google Cloud.
At AWS, Hassan pioneered several cloud services, including End-User Computing and Amazon Linux Desktops. He currently leads the Operating System technology on Google Compute Engine.
Hassan's passion for innovation and his deep understanding of cloud technologies have enabled him to make significant contributions to the industry. He is known for his ability to identify emerging trends and transform them into successful products.
Outside of work, Hassan enjoys spending time with his family and engaging in outdoor sports.

Hugo is an expert in Cloud Computing and Business Models, leading joint innovation between Canonical and Google. He has 18 years’ experience in Digital Transformation, including deep engagement in Open Source, Cloud Computing, 5G, AI, Cyber Security and Remote Working. He is also a passionate leader of global leading multi-culture and cross-function teams, including Product Management, Engineering, Strategy, Business Development, and Leadership Management. He holds an MBA degree from MIT Sloan. Outside of work, Hugo enjoys skiing and hiking with his lovely family.

Level of Difficulty Advanced

Presentation materials

There are no materials yet.