8000 Feat: Implement `:tagged_summary` report format · Issue #2 · ash-project/evals · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Feat: Implement :tagged_summary report format #2
Open
@nshkrdotcom

Description

@nshkrdotcom

Is your feature request related to a problem? Please describe.

Labels: feature, reporting, formatter

Dependencies: Blocks Ticket #1 (YAML Metadata Parsing)

Description:
As a model evaluator, I want a report format that groups evaluation results by their tags. This will allow for a more granular analysis of a model's performance, helping to identify strengths and weaknesses in specific areas like otp, ecto, or pattern-matching.

This requires creating a new report format option, :tagged_summary.

Describe the solution you'd like

Acceptance Criteria:

  • The Evals.Formatter.format_report/3 function accepts a new value for the :format option: :tagged_summary.
  • When format: :tagged_summary is used, the report output should:
    1. List each unique tag found across all completed evals.
    2. For each tag, display the average score of all evals that include that tag.
    3. The list of tags should be sorted alphabetically for consistent output.
    4. The final report must be clean and human-readable.
  • The implementation must not break the existing :full and :summary report formats.
  • If no evals have tags, this section of the report should be gracefully omitted.

Implementation Notes:

  • This work will be done in evals/formatter.ex.
  • A new private function, such as format_tagged_summary(results), will likely be needed.
  • The logic will involve:
    1. Reducing the results list into a map where keys are tags and values are lists of scores (e.g., %{otp: [1.0, 0.5], elixir_core: [1.0, 1.0, 0.5], ...}).
    2. Transforming this map to calculate the average score for each tag.
    3. Sorting the results by tag name before formatting them into strings.

Example Output:

... (Overall Summary) ...

TAGGED SUMMARY:
----------------------------------------
elixir-core          | 83.3%
ecosystem            | 100.0%
otp                  | 75.0%
pattern-matching     | 100.0%
security             | 0.0%

... (Detailed Results or end of report) ...

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0