Pulse · huggingface/lighteval · GitHub

8000 Pulse · huggingface/lighteval · GitHub

More Web Proxy on the site http://driver.im/

May 8, 2025 – June 8, 2025

Overview

42 Active pull requests

69 Active issues

1 Release published by 1 person

v0.10.0
published May 22, 2025

29 Pull requests merged by 14 people

Add Bulgarian and Macedonian literals
#769 merged Jun 6, 2025
[#794] Fix: Assign SummaCZS instance to self.summac in Faithfulness metric
#795 merged Jun 6, 2025
[IFEval] Speed up think tag removal
#792 merged Jun 4, 2025
add a regex to remove think tags before evaluating ifeval
#791 merged Jun 4, 2025
fix: multiple typos of different value
#782 merged May 28, 2025
Making bootstrap_iters an arg
#697 merged May 28, 2025
Adds GSM-PLUS
#780 merged May 28, 2025
Bump dev version to 0.10.1.dev0
#777 merged May 23, 2025
Async vllm
#693 merged May 22, 2025
Bump ruff version
#774 merged May 22, 2025
Nanotron, Multilingual tasks update + misc
#756 merged May 22, 2025
Add missing model_name fixes
#768 merged May 21, 2025
add dependencies to run after pip install
#767 merged May 21, 2025
fix custom model example
#766 merged May 21, 2025
Adds template for custom path saving results
#755 merged May 21, 2025
Allow for model kwargs when loading transformers from pretrained
#754 merged May 21, 2025
Add MCQ support to Yourbench evaluation
#734 merged May 20, 2025
Fix task metric type mismatch
#743 merged May 20, 2025
Adds multimodal support and MMMU pro
#675 merged May 19, 2025
Fix extractive match
#746 merged May 19, 2025
Added Flores
#717 merged May 19, 2025
Update main_endpoint.py
#739 merged May 19, 2025
Fix litellm
#736 merged May 16, 2025
Adds More Generative tasks
#694 merged May 16, 2025
Update README.md
#733 merged May 15, 2025
Fix revision arg for vLLM tokenizer
#721 merged May 15, 2025
Added support for quantization in vLLM backend
#690 merged May 12, 2025
Fix tqdm logging
#711 merged May 12, 2025
add livecodebench v6
#712 merged May 12, 2025

13 Pull requests opened by 11 people

update for CB
#714 opened May 9, 2025
Adds RULER benchmark
#722 opened May 15, 2025
Add Chinese (zh) Translation of Documentation
#744 opened May 19, 2025
Newer `openai` and loosened `httpx`
#758 opened May 21, 2025
Add Romanian literals
#764 opened May 21, 2025
Add TranslationLiterals for Language.DANISH
#770 opened May 22, 2025
Add support for vLLM KV-cache quantization
#773 opened May 22, 2025
Update translation_literals.py with icelandic
#775 opened May 22, 2025
Complete TranslationLiterals for Language.ESTONIAN
#779 opened May 23, 2025
Add org_to_bill parameter to documentation
#781 opened May 26, 2025
fix: update python api user docs
#784 opened May 27, 2025
fix(openai): improve tokenizer fallback and remove env_config param
#786 opened May 27, 2025
fix context size check in sglang model
#787 opened May 29, 2025

32 Issues closed by 5 people

[BUG] IFEval metrics incorrect for reasoning models
#790 closed Jun 4, 2025
[BUG] 1
#789 closed Jun 4, 2025
[EVAL] GSM Plus
#778 closed May 28, 2025
[FT] bump ruff version
#772 closed May 22, 2025
[BUG] fix dependencies when doing fresh install
#725 closed May 21, 2025
[BUG] pydantic throws error with custom evaluator
#757 closed May 21, 2025
[FT] Custom details and results saving path
#753 closed May 21, 2025
[FT] better support for model loading args in transformers
#752 closed May 21, 2025
[BUG] Python API docs generating splits forever
#762 closed May 21, 2025
[FT] Add multimodal for transformers models
#729 closed May 19, 2025
[EVAL] adds FLORES
#727 closed May 19, 2025
[BUG] remove use chat template flag for litellm
#738 closed May 19, 2025
[FT] Controlling the number of experiments/trials to run
#718 closed May 16, 2025
[BUG] add in the readme that we do not support windows
#728 closed May 15, 2025
[BUG] ImportError: cannot import name 'ExprExtractionConfig' from 'lighteval.metrics.dynamic_metrics'
#523 closed May 15, 2025
lighteval with llama3.2 [RuntimeError: No executable batch size found, reached zero.]
#525 closed May 15, 2025
updated pypi package with torch>=2.0,<3.0
#526 closed May 15, 2025
[FT] Faster generation with TransformersModel by using less padding
#531 closed May 15, 2025
couldn't find it in the cached files and it looks like Elron/bleurt-tiny-512, how to set the model path?
#545 closed May 15, 2025
More flexibility in parameters for OpenAI / LiteLLM
#544 closed May 15, 2025
how can i use this "community|alghafa:meta_ar_dialects " as a task
#554 closed May 15, 2025
[BUG] Nanotron runner imports non-existant
#555 closed May 15, 2025
Sth wrong in the parser for `generation_parameters` in `main_sglang.py`
#590 closed May 15, 2025
[BUG] evaluation on minervamath
#628 closed May 15, 2025
[EVAL] Clarification on Reproducing DeepSeek R1 Results with do_sampling=True
#631 closed May 15, 2025
[BUG] OSError: [Errno 22] Invalid argument
#632 closed May 15, 2025
[BUG] ImportError: cannot import name 'T_co' from 'torch.utils.data.distributed'
#633 closed May 15, 2025
[BUG] vLLM backend hangs with DDP
#670 closed May 15, 2025
[EVAL] Correct way to handle GSM8K in Turkish Evals?
#692 closed May 15, 2025
[BUG] Out of Memory problems with lighteval
#700 closed May 15, 2025
[BUG] CANNOT set override_batch_size when lighteval accelerate
#720 closed May 15, 2025
cannot import name 'EnvConfig' from 'lighteval.utils.utils'
#707 closed May 12, 2025

37 Issues opened by 13 people

[BUG] `test` split forced to hit `Careful` warning
#801 opened Jun 7, 2025
[BUG] Forced to hit `You cannot select the number of dataset splits` with `litellm`
#800 opened Jun 7, 2025
[FT] integrate typo checker to check typos like `refenrence`
#799 opened Jun 7, 2025
[FT] supporting train-time vs test-time metrics
#798 opened Jun 7, 2025
[FT] more docstrings and typing in `Doc`
#797 opened Jun 7, 2025
[FT] `StrEnum` for `suites` to intuitively document options
#796 opened Jun 6, 2025
[BUG] Faithfulness metric fails because the SummaCZS model is instantiated but never assigned
#794 opened Jun 5, 2025
[BUG] support direct evaluation of local API and local data sets?
#793 opened Jun 5, 2025
[FT] Store system prompt in results
#788 opened Jun 3, 2025
[BUG] OpenAIClient fails when using newer GPT models
#785 opened May 27, 2025
[BUG] Python API documentation
#783 opened May 27, 2025
[BUG] Is AIME24 broken?
#771 opened May 22, 2025
[FT] Add tests for nanotron
#765 opened May 21, 2025
[FT] Support computing/displaying stddev in the final table from the variation in sample scores (when using sampling metrics) instead of a bootstrap
#763 opened May 21, 2025
[FT] Python API docs using small model that can run on Mac
#761 opened May 21, 2025
[BUG] custom model docs don't run: missing imports
#760 opened May 21, 2025
[BUG] incorrect type hints such as `callable`
#759 opened May 21, 2025
[BUG] Make Probability metric not trigger Loglikehood requests twice when used alongside Accuracy Metrics
#751 opened May 20, 2025
[FT] `lighteval file` eval backend to work with stored JSONL/CSV files
#750 opened May 20, 2025
[FT] add `py.typed` so `lighteval` can work with type checkers
#749 opened May 20, 2025
[BUG] sync `LightevalTaskConfig` docstring with types/defaults
#748 opened May 20, 2025
[FT] allow `httpx>0.27`
#747 opened May 19, 2025
[FT] Manage script and language in the Language enum
#745 opened May 19, 2025
[BUG] Sampling and max new tokens params for accelerate backend not being applied correctly
#742 opened May 19, 2025
[EVAL] TauBench:
#741 opened May 19, 2025
[EVAL] SciCode: reasearch coding benchmark
#740 opened May 19, 2025
Error with value of `n`
#737 opened May 19, 2025
[FT] Load entire benchmark (data + spec) from the hub
#735 opened May 16, 2025
[BUG] Optimize tokenization
#732 opened May 15, 2025
[EVAL] HELMET: long context evals
#731 opened May 15, 2025
[EVAL] SWEBENCH multilingual
#730 opened May 15, 2025
[EVAL] Add RULER for evaluating long context
#726 opened May 15, 2025
[FT] Add tests for `VLLMModel` base methods
#724 opened May 15, 2025
[FT] Continuous batching for transformers
#723 opened May 15, 2025
[FT] Support evaluations with tool use
#719 opened May 15, 2025
Call for contributions: Translate lighteval's doc into Chinese
#716 opened May 14, 2025
[BUG] Installing lighteval breaks hydra-core
#713 opened May 9, 2025

22 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[BUG] Misleading metric type of LightevalTaskConfig
#710 commented on May 15, 2025 • 0 new comments
[FT] Improve Documentation and Examples
#682 commented on May 15, 2025 • 0 new comments
[BUG] encounter an ArrowInvalid error while saving experiment tracker
#660 commented on May 15, 2025 • 0 new comments
[FT] Log progress bar on main process
#662 commented on May 15, 2025 • 0 new comments
[BUG] Pipeline does not work with GenerationConfig
#657 commented on May 15, 2025 • 0 new comments
[BUG] Very slow livecodebench scoring
#650 commented on May 15, 2025 • 0 new comments
[BUG] core dump when using endpoint litellm
#623 commented on May 15, 2025 • 0 new comments
[FT] Build in a way to specify specific IDs/Lines in Dataset to use as few-shot examples in the same split
#634 commented on May 15, 2025 • 0 new comments
[BUG] Num samples not respected
#618 commented on May 15, 2025 • 0 new comments
[EVAL] Big-Bench Extra Hard (BBEH)
#600 commented on May 15, 2025 • 0 new comments
[FT] Assistant Response Prefilling
#591 commented on May 15, 2025 • 0 new comments
[EVAL] Add TUMLU benchmark
#577 commented on May 15, 2025 • 0 new comments
[FT] LiteLLM concurrency parameters hard-coded
#567 commented on May 15, 2025 • 0 new comments
<think> tags for thinking models
#513 commented on May 15, 2025 • 0 new comments
[EVAL] Adding PHARE
#696 commented on May 18, 2025 • 0 new comments
[FT] Custom model to TransformersModel
#489 commented on May 19, 2025 • 0 new comments
Loading local data for custom tasks
#681 commented on May 20, 2025 • 0 new comments
[FT] Numpy 2.0 support
#416 commented on Jun 5, 2025 • 0 new comments
new metrics and pr-fouras dataset add
#558 commented on Jun 6, 2025 • 0 new comments
Nanotron model updates
#652 commented on May 12, 2025 • 0 new comments
[WIP] Fix nanotron compatibility
#706 commented on May 20, 2025 • 0 new comments
refacto prompt building
#709 commented on Jun 5, 2025 • 0 new comments

0