-
Notifications
You must be signed in to change notification settings - Fork 277
Insights: huggingface/lighteval
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.10.0
published
May 22, 2025
29 Pull requests merged by 14 people
-
Add Bulgarian and Macedonian literals
#769 merged
Jun 6, 2025 -
[#794] Fix: Assign SummaCZS instance to
self.summac
in Faithfulness metric#795 merged
Jun 6, 2025 -
[IFEval] Speed up think tag removal
#792 merged
Jun 4, 2025 -
add a regex to remove think tags before evaluating ifeval
#791 merged
Jun 4, 2025 -
fix: multiple typos of different value
#782 merged
May 28, 2025 -
Making bootstrap_iters an arg
#697 merged
May 28, 2025 -
Adds GSM-PLUS
#780 merged
May 28, 2025 -
Bump dev version to 0.10.1.dev0
#777 merged
May 23, 2025 -
Async vllm
#693 merged
May 22, 2025 -
Bump ruff version
#774 merged
May 22, 2025 -
Nanotron, Multilingual tasks update + misc
#756 merged
May 22, 2025 -
Add missing model_name fixes
#768 merged
May 21, 2025 -
add dependencies to run after pip install
#767 merged
May 21, 2025 -
fix custom model example
#766 merged
May 21, 2025 -
Adds template for custom path saving results
#755 merged
May 21, 2025 -
Allow for model kwargs when loading transformers from pretrained
#754 merged
May 21, 2025 -
Add MCQ support to Yourbench evaluation
#734 merged
May 20, 2025 -
Fix task metric type mismatch
#743 merged
May 20, 2025 -
Adds multimodal support and MMMU pro
#675 merged
May 19, 2025 -
Fix extractive match
#746 merged
May 19, 2025 -
Added Flores
#717 merged
May 19, 2025 -
Update main_endpoint.py
#739 merged
May 19, 2025 -
Fix litellm
#736 merged
May 16, 2025 -
Adds More Generative tasks
#694 merged
May 16, 2025 -
Update README.md
#733 merged
May 15, 2025 -
Fix revision arg for vLLM tokenizer
#721 merged
May 15, 2025 -
Added support for quantization in vLLM backend
#690 merged
May 12, 2025 -
Fix tqdm logging
#711 merged
May 12, 2025 -
add livecodebench v6
#712 merged
May 12, 2025
13 Pull requests opened by 11 people
-
update for CB
#714 opened
May 9, 2025 -
Adds RULER benchmark
#722 opened
May 15, 2025 -
Add Chinese (zh) Translation of Documentation
#744 opened
May 19, 2025 -
Newer `openai` and loosened `httpx`
#758 opened
May 21, 2025 -
Add Romanian literals
#764 opened
May 21, 2025 -
Add TranslationLiterals for Language.DANISH
#770 opened
May 22, 2025 -
Add support for vLLM KV-cache quantization
#773 opened
May 22, 2025 -
Update translation_literals.py with icelandic
#775 opened
May 22, 2025 -
Complete TranslationLiterals for Language.ESTONIAN
#779 opened
May 23, 2025 -
Add org_to_bill parameter to documentation
#781 opened
May 26, 2025 -
fix: update python api user docs
#784 opened
May 27, 2025 -
fix(openai): improve tokenizer fallback and remove env_config param
#786 opened
May 27, 2025 -
fix context size check in sglang model
#787 opened
May 29, 2025
32 Issues closed by 5 people
-
[BUG] IFEval metrics incorrect for reasoning models
#790 closed
Jun 4, 2025 -
[BUG] 1
#789 closed
Jun 4, 2025 -
[EVAL] GSM Plus
#778 closed
May 28, 2025 -
[FT] bump ruff version
#772 closed
May 22, 2025 -
[BUG] fix dependencies when doing fresh install
#725 closed
May 21, 2025 -
[BUG] pydantic throws error with custom evaluator
#757 closed
May 21, 2025 -
[FT] Custom details and results saving path
#753 closed
May 21, 2025 -
[FT] better support for model loading args in transformers
#752 closed
May 21, 2025 -
[BUG] Python API docs generating splits forever
#762 closed
May 21, 2025 -
[FT] Add multimodal for transformers models
#729 closed
May 19, 2025 -
[EVAL] adds FLORES
#727 closed
May 19, 2025 -
[BUG] remove use chat template flag for litellm
#738 closed
May 19, 2025 -
[FT] Controlling the number of experiments/trials to run
#718 closed
May 16, 2025 -
[BUG] add in the readme that we do not support windows
#728 closed
May 15, 2025 -
lighteval with llama3.2 [RuntimeError: No executable batch size found, reached zero.]
#525 closed
May 15, 2025 -
updated pypi package with torch>=2.0,<3.0
#526 closed
May 15, 2025 -
[FT] Faster generation with TransformersModel by using less padding
#531 closed
May 15, 2025 -
More flexibility in parameters for OpenAI / LiteLLM
#544 closed
May 15, 2025 -
how can i use this "community|alghafa:meta_ar_dialects " as a task
#554 closed
May 15, 2025 -
[BUG] Nanotron runner imports non-existant
#555 closed
May 15, 2025 -
Sth wrong in the parser for `generation_parameters` in `main_sglang.py`
#590 closed
May 15, 2025 -
[BUG] evaluation on minervamath
#628 closed
May 15, 2025 -
[EVAL] Clarification on Reproducing DeepSeek R1 Results with do_sampling=True
#631 closed
May 15, 2025 -
[BUG] OSError: [Errno 22] Invalid argument
#632 closed
May 15, 2025 -
[BUG] ImportError: cannot import name 'T_co' from 'torch.utils.data.distributed'
#633 closed
May 15, 2025 -
[BUG] vLLM backend hangs with DDP
#670 closed
May 15, 2025 -
[EVAL] Correct way to handle GSM8K in Turkish Evals?
#692 closed
May 15, 2025 -
[BUG] Out of Memory problems with lighteval
#700 closed
May 15, 2025 -
[BUG] CANNOT set override_batch_size when lighteval accelerate
#720 closed
May 15, 2025 -
cannot import name 'EnvConfig' from 'lighteval.utils.utils'
#707 closed
May 12, 2025
37 Issues opened by 13 people
-
[BUG] `test` split forced to hit `Careful` warning
#801 opened
Jun 7, 2025 -
[BUG] Forced to hit `You cannot select the number of dataset splits` with `litellm`
#800 opened
Jun 7, 2025 -
[FT] integrate typo checker to check typos like `refenrence`
#799 opened
Jun 7, 2025 -
[FT] supporting train-time vs test-time metrics
#798 opened
Jun 7, 2025 -
[FT] more docstrings and typing in `Doc`
#797 opened
Jun 7, 2025 -
[FT] `StrEnum` for `suites` to intuitively document options
#796 opened
Jun 6, 2025 -
[BUG] Faithfulness metric fails because the SummaCZS model is instantiated but never assigned
#794 opened
Jun 5, 2025 -
[BUG] support direct evaluation of local API and local data sets?
#793 opened
Jun 5, 2025 -
[FT] Store system prompt in results
#788 opened
Jun 3, 2025 -
[BUG] OpenAIClient fails when using newer GPT models
#785 opened
May 27, 2025 -
[BUG] Python API documentation
#783 opened
May 27, 2025 -
[BUG] Is AIME24 broken?
#771 opened
May 22, 2025 -
[FT] Add tests for nanotron
#765 opened
May 21, 2025 -
[FT] Python API docs using small model that can run on Mac
#761 opened
May 21, 2025 -
[BUG] custom model docs don't run: missing imports
#760 opened
May 21, 2025 -
[BUG] incorrect type hints such as `callable`
#759 opened
May 21, 2025 -
[FT] `lighteval file` eval backend to work with stored JSONL/CSV files
#750 opened
May 20, 2025 -
[FT] add `py.typed` so `lighteval` can work with type checkers
#749 opened
May 20, 2025 -
[BUG] sync `LightevalTaskConfig` docstring with types/defaults
#748 opened
May 20, 2025 -
[FT] allow `httpx>0.27`
#747 opened
May 19, 2025 -
[FT] Manage script and language in the Language enum
#745 opened
May 19, 2025 -
[BUG] Sampling and max new tokens params for accelerate backend not being applied correctly
#742 opened
May 19, 2025 -
[EVAL] TauBench:
#741 opened
May 19, 2025 -
[EVAL] SciCode: reasearch coding benchmark
#740 opened
May 19, 2025 -
Error with value of `n`
#737 opened
May 19, 2025 -
[FT] Load entire benchmark (data + spec) from the hub
#735 opened
May 16, 2025 -
[BUG] Optimize tokenization
#732 opened
May 15, 2025 -
[EVAL] HELMET: long context evals
#731 opened
May 15, 2025 -
[EVAL] SWEBENCH multilingual
#730 opened
May 15, 2025 -
[EVAL] Add RULER for evaluating long context
#726 opened
May 15, 2025 -
[FT] Add tests for `VLLMModel` base methods
#724 opened
May 15, 2025 -
[FT] Continuous batching for transformers
#723 opened
May 15, 2025 -
[FT] Support evaluations with tool use
#719 opened
May 15, 2025 -
Call for contributions: Translate lighteval's doc into Chinese
#716 opened
May 14, 2025 -
[BUG] Installing lighteval breaks hydra-core
#713 opened
May 9, 2025
22 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[BUG] Misleading metric type of LightevalTaskConfig
#710 commented on
May 15, 2025 • 0 new comments -
[FT] Improve Documentation and Examples
#682 commented on
May 15, 2025 • 0 new comments -
[BUG] encounter an ArrowInvalid error while saving experiment tracker
#660 commented on
May 15, 2025 • 0 new comments -
[FT] Log progress bar on main process
#662 commented on
May 15, 2025 • 0 new comments -
[BUG] Pipeline does not work with GenerationConfig
#657 commented on
May 15, 2025 • 0 new comments -
[BUG] Very slow livecodebench scoring
#650 commented on
May 15, 2025 • 0 new comments -
[BUG] core dump when using endpoint litellm
#623 commented on
May 15, 2025 • 0 new comments -
[FT] Build in a way to specify specific IDs/Lines in Dataset to use as few-shot examples in the same split
#634 commented on
May 15, 2025 • 0 new comments -
[BUG] Num samples not respected
#618 commented on
May 15, 2025 • 0 new comments -
[EVAL] Big-Bench Extra Hard (BBEH)
#600 commented on
May 15, 2025 • 0 new comments -
[FT] Assistant Response Prefilling
#591 commented on
May 15, 2025 • 0 new comments -
[EVAL] Add TUMLU benchmark
#577 commented on
May 15, 2025 • 0 new comments -
[FT] LiteLLM concurrency parameters hard-coded
#567 commented on
May 15, 2025 • 0 new comments -
<think> tags for thinking models
#513 commented on
May 15, 2025 • 0 new comments -
[EVAL] Adding PHARE
#696 commented on
May 18, 2025 • 0 new comments -
[FT] Custom model to TransformersModel
#489 commented on
May 19, 2025 • 0 new comments -
Loading local data for custom tasks
#681 commented on
May 20, 2025 • 0 new comments -
[FT] Numpy 2.0 support
#416 commented on
Jun 5, 2025 • 0 new comments -
new metrics and pr-fouras dataset add
#558 commented on
Jun 6, 2025 • 0 new comments -
Nanotron model updates
#652 commented on
May 12, 2025 • 0 new comments -
[WIP] Fix nanotron compatibility
#706 commented on
May 20, 2025 • 0 new comments -
refacto prompt building
#709 commented on
Jun 5, 2025 • 0 new comments