A skillful Software Engineer with a history of implementing and architecting core features of complex software products, including distributed systems and microservice platforms; complemented by a master’s degree in Software Engineering & a bachelor’s degree in Telecommunication Engineering. What makes Ramiro stand out is his expertise in designing and implementing complex system integration. Self-motivated, able to work in a fast-paced, agile, and innovative environment, collaborating with the team. Eager to learn new technologies and modern trends.
Evaluating model performance remains a significant challenge in the rapidly evolving AI landscape. Traditional evaluation approaches often struggle with scale, consistency, and real-time feedback integration—precisely the problems that Elixir and the BEAM were designed to solve.
We’ll explore:
Live Benchmarking Pipelines: Implementing resilient GenServers and dynamic supervision trees that continuously process evaluation data at scale.
Concurrent Prompt Evaluation: Building distributed worker pools that can evaluate thousands of prompt variations across multiple LLM providers.
Systematic Human-in-the-Loop Automation: Designing resilient feedback processing pipelines using GenStage and Broadway that validate human annotations, detect inconsistent labelers, and automatically route corrections into training loops.
LangChain.ex Integration: Implementing “LLM-as-a-judge” evaluation patterns using LangChain.ex to create sophisticated, criteria-based evaluations of AI outputs with minimal code overhead.
This talk demonstrates why Elixir’s unique strengths make it the ideal platform for building AI evaluation systems that scale from prototype to production.
Key Takeaways:
This talk aims to bridge the gap between theoretical AI evaluation concepts and practical, production-ready implementations using Elixir’s unique capabilities. The audience will learn:
Target Audience: