8000 [BUG] Very slow livecodebench scoring · Issue #650 · huggingface/lighteval · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[BUG] Very slow livecodebench scoring #650
Open
@rawsh

Description

@rawsh

Describe the bug

Much slower scoring compared to livecodebench repo.

To Reproduce

Compare to LCB repo, scoring takes significantly longer and seems single threaded

Expected behavior

Match LCB scoring time with same number of threads set. I have tried setting num_process:

def codegen_metric(predictions: list[str], formatted_doc: Doc, **kwargs) -> float:
    """Estimates the Pass@1 metric for the code generation task.
    Extract the code from each prediction, Runs it for each sample and generations,
    and computes the Pass@1 over the outputs.
    """
    # Extract generated code snippets
    generated_code_snippets = [[extract_code(pred) for pred in predictions]]  # noqa: F841
    evaluation_sample = {  # noqa: F841
        "inputs": formatted_doc.specific["inputs"],
        "outputs": formatted_doc.specific["outputs"],
        "fn_name": formatted_doc.specific["fn_name"],
    }
    # This is a list of lists because
    evaluation_sample = [{"input_output": json.dumps(evaluation_sample)}]

    metrics, _ = codegen_metrics(
        evaluation_sample,
        generated_code_snippets,
        k_list=[1],  # Only run for Pass@1
        num_process_evaluate=64,
    )
    return metrics["pass@1"]

with no improvement in scoring time

Version info

Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.

latest master

conda create -n eval python=3.11
pip install vllm==0.7.2
pip install git+https://github.com/huggingface/lighteval.git#egg=lighteval[extended_tasks] math-verify==0.5.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0