8000 NinaTian98369 (Yufei) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View NinaTian98369's full-sized avatar

Highlights

  • Pro

Block or report NinaTian98369

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".

Python 29 3 Updated Aug 18, 2024

Arena-Hard-Auto: An automatic LLM benchmark.

Python 837 101 Updated May 1, 2025

Generative Judge for Evaluating Alignment

Python 238 14 Updated Jan 18, 2024

Evaluate the Quality of Critique

Python 35 Updated Jun 1, 2024

This is the repo for the paper Shepherd -- A Critic for Language Model Generation

Jupyter Notebook 218 9 Updated Aug 10, 2023

Code base for ICLR 2024 "Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature".

Python 303 50 Updated Apr 3, 2025

A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current issues and future directions.

220 13 Updated Dec 30, 2024

TuRnIng POint Dataset

Python 46 3 Updated Oct 17, 2019

potato: portable text annotation tool

Jupyter Notebook 333 56 Updated Apr 29, 2025

A library for advanced large language model reasoning

Python 2,132 189 Updated Apr 9, 2025

[ACL 2023] Reasoning with Language Model Prompting: A Survey

961 69 Updated May 21, 2025

KokoMind: Can LLMs Understand Social Interactions?

JavaScript 104 8 Updated Oct 3, 2023
Python 52 4 Updated Jun 29, 2023

Reasoning with Language Model is Planning with World Model

PDDL 166 18 Updated Aug 25, 2023

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,066 5,221 Updated Jun 27, 2024

Instruct-tune LLaMA on consumer hardware

Jupyter Notebook 18,909 2,230 Updated Jul 29, 2024

Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023)

Python 29 2 Updated Jun 4, 2023

Inference code for Llama models

Python 58,298 9,780 Updated Jan 26, 2025

WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.

Python 47 1 Updated Dec 7, 2023

Repo for Generating Flashbacks in Stories (NAACL'22)

Python 7 1 Updated Apr 28, 2022

Accompanying repo for the RLPrompt paper

Python 329 60 Updated Jun 6, 2024

NAACL Paper (IMHO Fine-Tuning Improves Claim Detection)

Python 8 1 Updated May 5, 2020

Python client for Moss: A System for Detecting Software Similarity

Python 399 74 Updated Jul 4, 2024
Python 1 Updated Jan 6, 2023

Assessing Humor in Edited News Headlines

Python 7 5 Updated Sep 8, 2020
Python 43 5 Updated Mar 24, 2023

Diverse Beam Search in Pytorch

Python 7 1 Updated Feb 26, 2019

EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation

Python 96 11 Updated Mar 20, 2023
Next
0