In this episode of the "Computer Architecture Podcast," hosts Dr. Suvinay Subramanian and Dr. Lisa Hsu are joined by Professor Christina Delimitrou, an Assistant Professor in the Electrical and Computer Engineering Department at Cornell University. Professor Delimitrou is a recipient of the 2020 IEEE TCCA Young Architect Award for her leading research in ML-driven management and design of cloud systems. Her work focuses on improving resource efficiency in large-scale data centers, QoS-aware scheduling, performance debugging, and cloud security.
The discussion delves into the evolving landscape of data center and cloud architectures, highlighting the shift from homogeneous commodity hardware to complex heterogeneous systems incorporating specialized accelerators. Professor Delimitrou elaborates on the transition from monolithic applications to microservices, exploring the benefits of modularity alongside the challenges in management, debugging, and inter-service dependencies. A significant portion of the conversation centers on the critical role of machine learning in optimizing and managing these intricate systems, aiming to automate processes and abstract complexity for both end-users and operators. The episode also touches upon Professor Delimitrou's experiences as a junior faculty member, offering valuable insights for those navigating academic research.
Chapters
00:00:00 — Introduction by Hosts
00:00:13 — Introducing Professor Christina Delimitrou
01:17:459 — What Gets Professor Delimitrou Up in the Morning?
01:46:612 — Major Trends Changing Data Center and Cloud Computing Landscapes (Evolution from Homogeneous to Heterogeneous Systems, Monoliths to Microservices)
03:46:380 — Expanding on Microservices: Differences from Monoliths and Unique Challenges
07:38:153 — Managing Complexity in Microservices: The Need for Global Visibility and Tracing
09:21:232 — Professor Delimitrou's Research Focus: Hardware and Software Approaches to System Optimization
13:24:377 — Machine Learning for Performance Debugging: The Seer and Sage Systems
17:44:847 — Real-World Application and Benchmarking: The Death Star Bench
19:38:977 — Dual Purpose of Benchmarks: Training ML Systems and Informing Hardware Design
22:27:630 — Designing ML for Systems: Inductive Bias and System Understanding
24:21:769 — Idiosyncrasies of ML in Systems: Beyond Accuracy to Latency and Explainability
25:31:960 — The Challenge of Variability and Non-Determinism Introduced by ML
27:25:521 — Hardware vs. Software: Identifying the Harder Problems in System Design
28:39:104 — Tackling Complex, Multi-Layered Problems: A Modular Approach
30:38:997 — Cultivating Student Expertise Across Hardware and Software Stacks
31:56:467 — Professor Delimitrou's Journey as Junior Faculty: Transitions, Time Management, and Grant Writing
37:02:308 — Impact of COVID-19 on Teaching and Advising
39:05:548 — Evolving the Computer Architecture Curriculum for Modern Systems
41:40:485 — Future Research Directions: Microservices, ML for Correctness, and Edge Computing
44:58:108 — The Role of Programming Frameworks and Higher-Level Interfaces
47:13:000 — Words of Wisdom for Aspiring Researchers and Faculty
48:18:166 — Closing Remarks
Takeaways
Cloud computing is evolving from homogeneous, commodity-based systems to complex, heterogeneous environments with specialized accelerators, driven by the need for performance and efficiency but introducing new management challenges.
The shift from monolithic applications to microservices provides modularity and scalability but introduces significant complexities in managing dependencies, performance debugging, and ensuring system-wide stability.
Machine learning is becoming indispensable for managing modern cloud systems, offering automation in areas like performance debugging (e.g., Seer and Sage systems), resource allocation, and abstracting complexity from end-users and operators.
Developing effective ML for systems requires a nuanced approach that goes beyond optimizing for accuracy, prioritizing low inference latency, explainability of models, and the ability to derive actionable insights for system improvement.
For aspiring researchers and junior faculty, Professor Delimitrou advises focusing on problems they are passionate about, starting with a manageable number of students to develop advising skills, and understanding that grant writing is a skill learned over time, with an emphasis on the quality of research.