In this episode, hosts Dr. Suvinay Subramanian and Dr. Lisa Hsu are joined by Dr. Arka Prava Basu, an Associate Professor at the Indian Institute of Science. Dr. Basu's research focuses on pushing the boundaries of memory management and software reliability for both CPUs and GPUs. His work spans optimizing memory systems for chiplet-based GPUs to developing innovative techniques for eliminating synchronization bottlenecks in GPU programs. He is a recipient of the Intel Rising Star Faculty Award and the ACM India Early Career Award, among other accolades for his contributions to GPU performance, programmability, and reliability.
The discussion delves deeply into the technical aspects of GPU programmability, with a particular emphasis on software reliability and efficiency. Dr. Basu shares insights from his extensive research, covering the evolution of GPU programming, the challenges of managing concurrency and memory, and the innovative approaches his team is developing to address these issues. The conversation also touches upon the interplay between hardware and software in GPU systems and how understanding this relationship is crucial for unlocking further performance gains. Listeners will gain a comprehensive understanding of the current state and future directions in GPU architecture and programming, sprinkled with some nostalgic reflections on the field's journey.
Chapters
00:00:13 — Introduction of Dr. Arka Prava Basu
00:01:52 — What Gets Dr. Basu Up in the Morning: The Shift from CPU to GPU Dominance
00:04:04 — The Biggest Difference Between CPU and GPU Programming
00:06:12 — Challenges of Correctness and Performance in GPU Programming
00:07:12 — The Concept of "Quality" in GPU Software: Correctness and Efficiency
00:08:00 — Synchronization Bugs and Their Impact on GPU Program Reliability
00:13:30 — Synchronization Scopes in GPU Programming
00:15:10 — The Impact of System Architecture on GPU Synchronization
00:17:15 — Challenges and Opportunities with Persistent Memory in GPU Systems
00:20:14 — Tooling for GPU Programmability: ScopeAdvice and Static vs. Dynamic Analysis
00:24:38 — Dr. Basu's Journey into Computer Architecture and GPU Research
00:28:30 — The Role of Serendipity and Long-Term Learning in Research
00:30:00 — Static vs. Dynamic Analysis for GPU Program Optimization
00:31:51 — Leveraging Semantic Information for GPU Performance
00:33:30 — Division of Labor Between CPU and GPU in Modern Systems
00:56:07 — Impact of High-Bandwidth Interconnects (like NVLink) on CPU-GPU Cooperation
01:01:30 — The Future of GPU Programming and Persistent Memory
01:02:30 — Words of Advice for Young Researchers
Takeaways
GPU programming presents unique challenges compared to CPU programming, primarily due to its massively parallel nature and different memory models, requiring developers to rethink synchronization and data management.
Software reliability in GPUs is critical; synchronization bugs, which can be intermittent and hard to debug, can lead to catastrophic failures, emphasizing the need for robust programming practices and tools.
Understanding the hierarchical structure of GPU hardware (SMs, warps, thread blocks) and the concept of synchronization scopes is essential for writing efficient and correct GPU programs.
The division of labor between CPUs and GPUs is evolving, particularly with the advent of unified virtual memory and high-bandwidth interconnects, creating new opportunities for optimizing applications that leverage both processing units.
A long-term perspective and continuous learning are vital in research; initial explorations or collaborations, even if they don't yield immediate results, can provide foundational knowledge that becomes crucial for future breakthroughs.