feat: custom evaluator agent for accuracy evals #3389

manuhortet · 2025-05-28T09:24:55Z

Summary

Update to AccuracyEval to support providing a custom evaluator agent.

Type of change

Checklist

Code complies with style guidelines
Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
Self-review completed
Documentation updated (comments, docstrings)
Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
Tested in clean environment
Tests added/updated (if applicable)

dirkbrnd · 2025-06-12T15:53:27Z

cookbook/evals/accuracy/additional_evaluation_guidelines.py

+# This is the agent we will use to perform the evaluation
+evaluator_agent = Agent(
+    model=OpenAIChat(id="o4-mini"),
+    tools=[CalculatorTools(enable_all=True)],


does it need calculator tools?

dirkbrnd · 2025-06-12T15:54:23Z

cookbook/evals/accuracy/custom_response_model.py

+    expected_output="$1,739,130.43",
+)
+
+result: Optional[AccuracyResult] = evaluation.run(print_results=True)


How do you now access that custom response?

dirkbrnd · 2025-06-12T15:55:04Z

libs/agno/agno/eval/accuracy.py

+        """Check if the evaluator agent is using a custom response model"""
+        if not self.evaluator_agent:
+            return False
+        return self.evaluator_agent.response_model is not AccuracyAgentResponse


shouldn't this use isinstance?

Or actually like type comparison? I'm not sure, just asking

dirkbrnd · 2025-06-12T15:56:14Z

libs/agno/agno/eval/accuracy.py

@@ -419,7 +455,11 @@ def run_with_output(
        )

        if result is not None:
-            self.result.results.append(result)
+            if self._using_custom_response():
+                print(f"Evaluator Agent response: {result}")


feat: custom evaluator agent for accuracy evals

f2436c6

manuhortet requested a review from a team as a code owner May 28, 2025 09:24

mypy fix

96a1d82

dirkbrnd reviewed Jun 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: custom evaluator agent for accuracy evals #3389

feat: custom evaluator agent for accuracy evals #3389

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: custom evaluator agent for accuracy evals #3389

Are you sure you want to change the base?

feat: custom evaluator agent for accuracy evals #3389

Uh oh!

Conversation

Summary

Type of change

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!