-
Notifications
You must be signed in to change notification settings - Fork 3.6k
feat: custom evaluator agent for accuracy evals #3389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# This is the agent we will use to perform the evaluation | ||
evaluator_agent = Agent( | ||
model=OpenAIChat(id="o4-mini"), | ||
tools=[CalculatorTools(enable_all=True)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need calculator tools?
expected_output="$1,739,130.43", | ||
) | ||
|
||
result: Optional[AccuracyResult] = evaluation.run(print_results=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you now access that custom response?
"""Check if the evaluator agent is using a custom response model""" | ||
if not self.evaluator_agent: | ||
return False | ||
return self.evaluator_agent.response_model is not AccuracyAgentResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this use isinstance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or actually like type comparison? I'm not sure, just asking
@@ -419,7 +455,11 @@ def run_with_output( | |||
) | |||
|
|||
if result is not None: | |||
self.result.results.append(result) | |||
if self._using_custom_response(): | |||
print(f"Evaluator Agent response: {result}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log?
Summary
Update to AccuracyEval to support providing a custom evaluator agent.
Type of change
Checklist
./scripts/format.sh
and./scripts/validate.sh
)