On the sad state of affairs Please note that this particular talk relies heavily on animations, and they are not displayed properly in the slideshow above. This will be fixed soon-ish.
You’re merrily walking by your production environment one day when you notice that… something feels odd. You can’t quite put your finger on it, though. It’s not like there are things literally on fire. It’s not like you have any unique identifier to chase. Something just feels wrong. You know it’s probably wrong, even. You just don’t quite know what. And why.
How do you go about debugging these kind of failures? In this talk we look at what the experience of debugging could look like when using different forms of time-travelling debuggers, and then dive into the technical concepts that make this possible. Want to have time-travelling always enabled in production? Recording years of data, even? That is possible, and we look into what kind of trade-offs you’ll be looking to make here.
Related material
- Debugging Reinvented: asking and answering why and why not questions about program behavior
— Amy J. Ko, and Brad A. Myers (2008)
Amy’s work on the Whyline debugger is a magical tale of what debugging tools could look like if you looked at them through a lens of augmentation, rather than a one-stop-solution-to-all-your-problems. - Reverb: speculative debugging for web applications
— Ravi Netravali, and James Mickens (2019)
Record-and-replay debugging generally don’t take to changing the source code very well; Ravi and James’ work shows that tracing at higher levels helps significantly with how much fun stuff you can actually do with traces, due to improved semantic restrictions! - The rr debugger
— Robert O’Callahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll, and Nimrod Partush (2017)
Originally designed for debugging hard-to-reproduce failures in Firefox through record-and-replay, rr is a neat example of practical engineering for general-purpose time-travelling debuggers. The limitations of tracing at a very low level are certainly offset by its broad applicability. - Crochet’s tracing support
— Quil (2022)
A lot of the ideas in this talk are just things I’ve been playing around with in Crochet before I had the chance to apply them to a bigger production environment. Crochet’s focuses primarily on domain-specific debuggers. - Erlang’s tracing support The BEAM has a very extensive suite of trace-based debugging tools as libraries, which one can use as a basis for building their own debugging tools, and they were a major influence on Crochet’s trace-debugging design (and, consequently, on this talk as well).