Scientific NLP, Science of Science, Recommendation Systems, OS, Rust
-
Tencent
- China
-
11:42
(UTC +08:00)
Stars
LLM-eval
eval tools for LLM
6 repositories
Code for the paper "Evaluating Large Language Models Trai 5C32 ned on Code"
CMMLU: Measuring massive multitask language understanding in Chinese
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
A framework for few-shot evaluation of language models.
Do Multilingual Language Models Think Better in English?