Karlo Smid

Elixir Developer @ Yolo

Karlo created his first core dump file in C in 1994 and his first Elixir core dump file in 2016. Currently, he is creating Elixir code that runs as a microservice in the Yolo infrastructure. In his free time, Karlo walks, does stand-up comedy, practices Rubik Cube solving, and he is probably The Worst Chess Player In The World (TWCPITW).

Talk:
Taming 20 Million Oban Jobs in Production

What happens when a queue meant for thousands of jobs swells to 20 million—and keeps growing? In this talk, I’ll walk you through a real-world incident where our Oban queue spiraled out of control in production. We’ll dissect what led to this situation, how we investigated the root cause without halting traffic, and the changes we made to restore stability. This isn’t just a story about background jobs—it’s a practical lesson in diagnosing live systems, managing back pressure, and making safe trade-offs in production. As a bonus, I’ll share battle-tested design tips for building resilient systems with Oban—from queue isolation and job retries to rate-limiting and observability. Whether you’re scaling a system or preventing your own “20 million jobs” moment, these lessons will help.

Key Takeaways:

  • Strategies for debugging and resolving large-scale queue backlogs

  • Practical advice for running Oban in production environments

  • How to design job-processing systems that are resilient and observable

  • Techniques for managing job volume, retries, and queue isolation

  • Bonus tips from real-world experience designing with Oban

Target Audience:

  • Elixir developers, DevOps, and engineering leads working with or planning to adopt Oban.