[BUG] Very slow livecodebench scoring

Describe the bug

Much slower scoring compared to livecodebench repo.

To Reproduce

Compare to LCB repo, scoring takes significantly longer and seems single threaded

Expected behavior

Match LCB scoring time with same number of threads set. I have tried setting num_process:

def codegen_metric(predictions: list[str], formatted_doc: Doc, **kwargs) -> float:
    """Estimates the Pass@1 metric for the code generation task.
    Extract the code from each prediction, Runs it for each sample and generations,
    and computes the Pass@1 over the outputs.
    """
    # Extract generated code snippets
    generated_code_snippets = [[extract_code(pred) for pred in predictions]]  # noqa: F841
    evaluation_sample = {  # noqa: F841
        "inputs": formatted_doc.specific["inputs"],
        "outputs": formatted_doc.specific["outputs"],
        "fn_name": formatted_doc.specific["fn_name"],
    }
    # This is a list of lists because
    evaluation_sample = [{"input_output": json.dumps(evaluation_sample)}]

    metrics, _ = codegen_metrics(
        evaluation_sample,
        generated_code_snippets,
        k_list=[1],  # Only run for Pass@1
        num_process_evaluate=64,
    )
    return metrics["pass@1"]

with no improvement in scoring time

Version info

Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.

latest master

conda create -n eval python=3.11
pip install vllm==0.7.2
pip install git+https://github.com/huggingface/lighteval.git#egg=lighteval[extended_tasks] math-verify==0.5.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Describe the bug

To Reproduce

Expected behavior

Version info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Describe the bug

To Reproduce

Expected behavior

Version info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions