Programming ________ Systems: Distributed | Collaborative | AI/LLM

Staff - Faculty of Informatics

Date: 20 June 2024 / 14:00 - 15:00

USI East Campus, Room C1.03

Speaker: Heather Miller, Carnegie Mellon University, USA

Abstract
In this talk I'll talk about (the act of) programming systems that we don’t think all that often about programming as things themselves. We think about “hacking on” a distributed system, making and handling remote requests while manually managing local or remote data, or hand-tuning long and complex LLM prompts, or trying to wrangle concurrency and figure out how to program a collaborative app (how does one even begin to implement something like Google Docs?) But we don’t think about these things, or parts of these things, as programming components in their own right. Broadly, my research to date has focused largely on reasoning about the behavior and performance of parts of these systems, and on helping end developers more directly, and correctly, build distributed, concurrent, collaborative, and, now, AI-enabled systems, without repeatedly mucking around in the same low level details.
I’ll start by presenting results in the realm of programming distributed systems inspired by my work with the Apache Spark project and the Scala programming language. In frameworks like Spark, the key idea is to ship functionality to large amounts of distributed data, which is a markedly error-prone affair. I’ll discuss some of my research results here that increased the reliability of distributing functions and objects, while providing better performance than state of the art approaches for serializing and distributing data. 
From there, I’ll touch upon ongoing work in my lab to bring better and more composable reasoning abilities to the realm of building collaborative software systems, like Google Docs, Google Slides, or even collaborative multiplayer games. I’ll discuss a programming system called Collabs and some corresponding composition techniques that aim to make building collaborative apps more like building regular, non-collaborative apps, without sacrificing performance.
And finally, I’ll touch upon more recent work I have begun to undertake– LM programming, or programming Compound AI Systems, that is, building pipelines that contain AI components like LLMs, custom inference logic, retrieval models, etc, as component parts. While it is becoming increasingly easier to build impressive demos with language models (LMs), turning these demos into reliable production systems remains a messy affair, requiring complex and hand-tuned combinations of prompting, chaining, and fine tuning LMs. I’ll talk briefly about a programming model called DSPy for programming LM components that I contributed to that aims to tame this messy status quo, and where I see us going from here.

Biography
Heather Miller is an Assistant Professor in Carnegie Mellon University's School of Computer Science, primarily affiliated with the Software and Societal Systems Department, where, with Ben Titzer, she co-founded the WebAssembly Research Center. Previously, she was an Assistant Clinical Professor at Northeastern University’s College of Computer and Information Science in Boston while serving as the co-founder and Executive Director of the Scala Center at EPFL, where she also was a Research Scientist. She completed her PhD in EPFL’s Faculty of Computer and Communication Science advised by Prof. Martin Odersky where she contributed to the now-widespread programming language, Scala. Heather’s research interests lie broadly in programming systems– distributed systems, collaborative software systems, and more recently Compound AI Systems, or, systems containing AI components like LLMs or retrieval models. Whether at the intersection of data-centric distributed systems and programming languages, or Compound AI Systems, all of her work maintains a focus on transferring her research results into industrial use. She has also led development of popular MOOCs over 1M students strong, such as “Functional Programming Principles in Scala” and her own MOOC, “Big Data Analysis with Scala and Spark.” Heather’s work has been recognized by awards such as the 2019 ACM SIGPLAN Programming Languages Software Award, and she is the recipient of the 2023 Dahl-Nygaard Junior Prize.

Host: Prof. Marc Langheinrich