Custom judges #1351

MichalMarsalek · 2024-09-05T21:02:32Z

I think that ideally, there should be no Runs generator. Instead, there should be a separate input/args generator and a separate checker/judge, to allow more flexibility. I didn't want to refactor all of the holes (plus some are implemented in a way where the output is not generated by solving for the input, but rather in one step). So, this PR keeps the current way of defining holes, but makes it easier to implement holes that require special output validation by introducing Judges. We already had one such judge - the order agnostic judge, this PR generalises that.
A judge takes the hole inputs and the user output and returns a valid solution close to the user's. I consider this to be quite a clever way of supporting different custom validation rules without having to add special UI for each new hole (but of course special UI for a given judge can be added, like I did for the order agnostic one).

To implement a hole one should

fill in the .Answer of each Run in the Run generator, or
leave it empty and define a judge, which will compute it dynamically based on user output, or
do both - a judge has access to the preset .Answer value and can be used to just modify that.

This PR also adds a "Css Colors Inverse" hole as a showcase.

MichalMarsalek · 2024-09-06T13:05:53Z

hole/judges.go

+	})
+}
+
+func getClosestMultiset(anyAnswer, stdout, itemDelimiter string) string {


This function is just moved over from play.go, but support for case insensitivity was added.

…fturing the multisetjudge on

JRaspass · 2025-04-30T18:34:34Z

hole/play.go

-		answers = []Answer{{Args: []string{}, Answer: code}}
+	// All other holes use the default judge which compares by equality (trimming the line endings)
+	if holeJudges[hole.ID] == nil {
+		holeJudges[hole.ID] = defaultJudge


I'm not sure I understand why these are updated per-request. They won't change for the lifetime of a deployment right? Could we not pre-populate these in an init block somewhere?

Yes, that makes sense.

JRaspass · 2025-05-11T12:05:04Z

Hmm this failure look legit, can you please take a look @MichalMarsalek

# Failed test 'Trivial Tex Quine is blocked'
# at t/lang-features.t line 31
# expected: $(:answer("Trivial\n"), :!pass, :stderr("Quine in TeX must have at least one '\\' character."))
#      got: $(:answer(""), :pass(Bool::False), :stderr("Quine in TeX must have at least one '\\' character."))
# You failed 1 test of 12

MichalMarsalek · 2025-05-11T18:26:05Z

@JRaspass Oh, sorry I wasn't paying much attention to the e2e tests, because they are flaky. Yes, I broke that expectation - before the changes the quine expected answer was being set at the very beginning, after the changes it is set later and since the TeX-trivial-solution-prevention fails the solution early without running the code (keeps the stdout empty and sets stderr), we never get to the step when the expected answer now gets set. I see 4 solutions:

Accept the new behaviour and update the test.
Leave the quine special case at the beggining.
Run the code even when the solution fails the triviality check (but still set the stderr).
Run the judges even when we receive empty stdout from the runner.

JRaspass · 2025-05-11T22:18:52Z

Ah I see, I think it's fine to update the test(s). The main thing is that trivial quines are prevented, the exact observable output doesn't matter hugely.

MichalMarsalek · 2025-05-12T06:26:36Z

Yes, but now that I think about it, it feels kinda weird to have that expected output empty if the code fails to output. I think option 4 is best here.

config/holes.go

hole/css-colors-inverse.go

init

7c186e9

MichalMarsalek commented Sep 6, 2024

View reviewed changes

MichalMarsalek added 2 commits September 6, 2024 15:50

fix errors

0076b60

add css inverse hole to showcase a custom judge

aa47515

MichalMarsalek marked this pull request as ready for review September 6, 2024 16:36

MichalMarsalek marked this pull request as draft September 6, 2024 16:41

MichalMarsalek added 3 commits September 6, 2024 18:57

refactor multisetJudge

4f83050

rename to MultisetItemDelimiter to make it clear this has an effect o…

368f75c

…fturing the multisetjudge on

support case-insensitive multiset judge

255a833

MichalMarsalek mentioned this pull request Sep 6, 2024

Css Colors Inverse #1353

Open

fix case insensitive multiset matching

0b8515c

MichalMarsalek marked this pull request as ready for review September 6, 2024 17:36

MichalMarsalek added 2 commits September 7, 2024 09:38

optimize cases where there's a single sol

d031b7d

Merge remote-tracking branch 'upstream/master' into judges

d7cd217

btnlq mentioned this pull request Dec 31, 2024

Add Set hole #1058

Closed

MichalMarsalek added 3 commits March 25, 2025 20:35

merge master

ee52c53

Merge remote-tracking branch 'upstream/master' into judges

0eacdb3

lint

f45a0c2

JRaspass reviewed Apr 30, 2025

View reviewed changes

init judges in init & fix css inverse

521ebc8

run judge for empty output

5f5da24

refactor the code that actually runs the code into a new runCode func

96d3bf7

JRaspass requested changes May 13, 2025

View reviewed changes

config/holes.go Show resolved Hide resolved

hole/css-colors-inverse.go Outdated Show resolved Hide resolved

hole/css-colors-inverse.go Outdated Show resolved Hide resolved

use init block for css inverse def

4d03681

JRaspass merged commit 2b856ab into code-golf:master May 14, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Custom judges #1351

Custom judges #1351

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Custom judges #1351

Custom judges #1351

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!