Open
Description
Is your feature request related to a problem? Please describe.
Labels: feature
, reporting
, formatter
Dependencies: Blocks Ticket #1 (YAML Metadata Parsing)
Description:
As a model evaluator, I want a report format that groups evaluation results by their tags. This will allow for a more granular analysis of a model's performance, helping to identify strengths and weaknesses in specific areas like otp
, ecto
, or pattern-matching
.
This requires creating a new report format option, :tagged_summary
.
Describe the solution you'd like
Acceptance Criteria:
- The
Evals.Formatter.format_report/3
function accepts a new value for the:format
option::tagged_summary
. - When
format: :tagged_summary
is used, the report output should:- List each unique tag found across all completed evals.
- For each tag, display the average score of all evals that include that tag.
- The list of tags should be sorted alphabetically for consistent output.
- The final report must be clean and human-readable.
- The implementation must not break the existing
:full
and:summary
report formats. - If no evals have tags, this section of the report should be gracefully omitted.
Implementation Notes:
- This work will be done in
evals/formatter.ex
. - A new private function, such as
format_tagged_summary(results)
, will likely be needed. - The logic will involve:
- Reducing the
results
list into a map where keys are tags and values are lists of scores (e.g.,%{otp: [1.0, 0.5], elixir_core: [1.0, 1.0, 0.5], ...}
). - Transforming this map to calculate the average score for each tag.
- Sorting the results by tag name before formatting them into strings.
- Reducing the
Example Output:
... (Overall Summary) ...
TAGGED SUMMARY:
----------------------------------------
elixir-core | 83.3%
ecosystem | 100.0%
otp | 75.0%
pattern-matching | 100.0%
security | 0.0%
... (Detailed Results or end of report) ...
Describe alternatives you've considered
No response
Additional context
No response