Ramiro Matteoda

Senior Software Engineer @ ArionKoder

A skillful Software Engineer with a history of implementing and architecting core features of complex software products, including distributed systems and microservice platforms; complemented by a master’s degree in Software Engineering & a bachelor’s degree in Telecommunication Engineering. What makes Ramiro stand out is his expertise in designing and implementing complex system integration. Self-motivated, able to work in a fast-paced, agile, and innovative environment, collaborating with the team. Eager to learn new technologies and modern trends.

Talk:
Concurrent AI Evaluation: Scaling Model Performance Monitoring With OTP

Evaluating model performance remains a significant challenge in the rapidly evolving AI landscape. Traditional evaluation approaches often struggle with scale, consistency, and real-time feedback integration—precisely the problems that Elixir and the BEAM were designed to solve.

We’ll explore:

Live Benchmarking Pipelines: Implementing resilient GenServers and dynamic supervision trees that continuously process evaluation data at scale.

Concurrent Prompt Evaluation: Building distributed worker pools that can evaluate thousands of prompt variations across multiple LLM providers.

Systematic Human-in-the-Loop Automation: Designing resilient feedback processing pipelines using GenStage and Broadway that validate human annotations, detect inconsistent labelers, and automatically route corrections into training loops.

LangChain.ex Integration: Implementing “LLM-as-a-judge” evaluation patterns using LangChain.ex to create sophisticated, criteria-based evaluations of AI outputs with minimal code overhead.

This talk demonstrates why Elixir’s unique strengths make it the ideal platform for building AI evaluation systems that scale from prototype to production.

Key Takeaways:

  • This talk aims to bridge the gap between theoretical AI evaluation concepts and practical, production-ready implementations using Elixir’s unique capabilities. The audience will learn:

    1. How to design and implement scalable AI evaluation architectures using OTP principles
    1. Practical patterns for handling the inherent uncertainty in AI evaluation (timeouts, rate limits, inconsistent results)
    1. How to integrate human feedback loops into automated evaluation pipelines
    1. Working code examples using LangChain.ex and other Elixir AI libraries that they can adapt for their own projects

Target Audience:

  • Elixir developers working with AI/ML systems, AI engineers interested in scalable evaluation architectures, and teams building production LLM applications who need robust monitoring solutions.