10000 Custom judges by MichalMarsalek · Pull Request #1351 · code-golf/code-golf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Custom judges #1351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 14, 2025
Merged

Custom judges #1351

merged 16 commits into from
May 14, 2025

Conversation

MichalMarsalek
Copy link
Collaborator
@MichalMarsalek MichalMarsalek commented Sep 5, 2024

I think that ideally, there should be no Runs generator. Instead, there should be a separate input/args generator and a separate checker/judge, to allow more flexibility. I didn't want to refactor all of the holes (plus some are implemented in a way where the output is not generated by solving for the input, but rather in one step). So, this PR keeps the current way of defining holes, but makes it easier to implement holes that require special output validation by introducing Judges. We already had one such judge - the order agnostic judge, this PR generalises that.
A judge takes the hole inputs and the user output and returns a valid solution close to the user's. I consider this to be quite a clever way of supporting different custom validation rules without having to add special UI for each new hole (but of course special UI for a given judge can be added, like I did for the order agnostic one).

To implement a hole one should

  1. fill in the .Answer of each Run in the Run generator, or
  2. leave it empty and define a judge, which will compute it dynamically based on user output, or
  3. do both - a judge has access to the preset .Answer value and can be used to just modify that.

This PR also adds a "Css Colors Inverse" hole as a showcase.

hole/judges.go Outdated
})
}

func getClosestMultiset(anyAnswer, stdout, itemDelimiter string) string {
Copy link
Collaborator Author
@MichalMarsalek MichalMarsalek Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is just moved over from play.go, but support for case insensitivity was added.

@MichalMarsalek MichalMarsalek marked this pull request as ready for review September 6, 2024 16:36
@MichalMarsalek MichalMarsalek marked this pull request as draft September 6, 2024 16:41
@MichalMarsalek MichalMarsalek marked this pull request as ready for review September 6, 2024 17:36
@btnlq btnlq mentioned this pull request Dec 31, 2024
hole/play.go Outdated
answers = []Answer{{Args: []string{}, Answer: code}}
// All other holes use the default judge which compares by equality (trimming the line endings)
if holeJudges[hole.ID] == nil {
holeJudges[hole.ID] = defaultJudge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why these are updated per-request. They won't change for the lifetime of a deployment right? Could we not pre-populate these in an init block somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

@JRaspass
Copy link
Collaborator

Hmm this failure look legit, can you please take a look @MichalMarsalek

# Failed test 'Trivial Tex Quine is blocked'
# at t/lang-features.t line 31
# expected: $(:answer("Trivial\n"), :!pass, :stderr("Quine in TeX must have at least one '\\' character."))
#      got: $(:answer(""), :pass(Bool::False), :stderr("Quine in TeX must have at least one '\\' character."))
# You failed 1 test of 12

@MichalMarsalek
Copy link
Collaborator Author
MichalMarsalek commented May 11, 2025

@JRaspass Oh, sorry I wasn't paying much attention to the e2e tests, because they are flaky. Yes, I broke that expectation - before the changes the quine expected answer was being set at the very beginning, after the changes it is set later and since the TeX-trivial-solution-prevention fails the solution early without running the code (keeps the stdout empty and sets stderr), we never get to the step when the expected answer now gets set. I see 4 solutions:

  1. Accept the new behaviour and update the test.
  2. Leave the quine special case at the beggining.
  3. Run the code even when the solution fails the triviality check (but still set the stderr).
  4. Run the judges even when we receive empty stdout from the runner.

@JRaspass
Copy link
Collaborator

Ah I see, I think it's fine to update the test(s). The main thing is that trivial quines are prevented, the exact observable output doesn't matter hugely.

@MichalMarsalek
Copy link
Collaborator Author
MichalMarsalek commented May 12, 2025

Yes, but now that I think about it, it feels kinda weird to have that expected output empty if the code fails to output. I think option 4 is best here.

@JRaspass JRaspass merged commit 2b856ab into code-golf:master May 14, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0