Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.
benchmark leaderboard gemini llama language-model claude rag o1 hallucinations ai-evaluation llm gemini-pro llm-benchmarking confabulations deepseek-r1 o3-mini
-
Updated
May 29, 2025 - HTML