8000 GitHub - icip-cas/Verifier-Engineering: Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Notifications You must be signed in to change notification settings

icip-cas/Verifier-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of 8000 Foundation Models via Verifier Engineering

Overview

This is a collection of papers and other resources for verifier engineering, which corresponds to the paper Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering. We will update the paper content and this repo regularly, and we very much welcome suggestions of any kind.

Note

🌟 Feel free to submit pull requests to share your work and insights from the perspective of verifier engineering - your contributions are always welcome!

Overview of Common Verifiers

Verifier Type Verification Form Verify Granularity Verifier Source Extra Training
Golden Annotation Binary/Text Thought Step/Full Trajectory Program Based No
Rule-based Binary/Text Thought Step/Full Trajectory Program Based No
Code Interpreter Binary/Score/Text Token/Thought Step/Full Trajectory Program Based No
ORM Binary/Score/Rank/Text Full Trajectory Model Based Yes
Language Model Binary/Score/Rank/Text Thought Step/Full Trajectory Model Based Yes
Tool Binary/Score/Rank/Text Token/Thought Step/Full Trajectory Program Based No
Search Engine Text Thought Step/Full Trajectory Program Based No
PRM Score Token/Thought Step Model Based Yes
Knowledge Graph Text Thought Step/Full Trajectory Program Based No

A Verifier Engineering Perspective on Post-training Methods

Search Verify Feedback Task
STar
RFT
WizardMath
Linear Golden Annotation Imitation Learning Math
CAG Linear Golden Annotation Imitation Learning RAG
Self-Instruct Linear Rule-based Imitation Learning General
Code Alpaca
WizardCoder
Linear Rule-based Imitation Learning Code
ILF-Code Linear Code interpreter
Human
Imitation Learning Code
RAFT
RRHF
Linear ORM Imitation Learning General
SSO Linear Rule-based Preference Learning Alignment
CodeUltraFeedback Linear Language Model Preference Learning Code
Self-Rewarding Linear Language Model Preference Learning Alignment
StructRAG Linear Language Model Preference Learning RAG
MCTS-DPO Tree Language Model Preference Learning Math
Chain of Preference Optimization Tree Language Model Preference Learning Reasoning
LLAMA-BERRY Tree ORM Preference Learning Reasoning
Math-Shepherd Linear Golden Annotation
Rule-based
Reinforcement Learning Math
RLTF
PPOCoder
Linear Code Interpreter Reinforcement Learning Code
RLAIF Linear Language Model Reinforcement Learning General
SIRLC Linear Language Model Reinforcement Learning Reasoning
RLFH Linear Language Model Reinforcement Learning Knowledge
RLHF Linear ORM Reinforcement Learning Alignment
Quark Linear Tool Reinforcement Learning Alignment
RLVR Linear Binary Verifier Reinforcement Learning General
ReST-MCTS Tree Language Model Reinforcement Learning Math
CRITIC Linear Code Interpreter
Tool
Search Engine
Verifier-Aware Math
Code
Knowledge
General
Self-Debug Linear Code Interpreter Verifier-Aware Code
Self-Refine Linear Language Model Verifier-Aware Alignment
ReAct Linear Search Engine Verifier-Aware Knowledge
Constrative Decoding Linear Language Model Verifier-Guided General
Chain-of-Verification Linear Language Model Verifier-Guided Knowledge
Inverse Value Learning Linear Language Model Verifier-Guided General
PRM Linear PRM Verifier-Guided Math
KGR Linear Knowledge Graph Verifier-Guided Knowledge
UoT Tree Language Model Verifier-Guided General
ToT Tree Language Model Verifier-Guided Reasoning

Citation

If you find our repo useful in your research, please consider citing:

@article{VerifierEngineering,
    title={Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering},
    author={Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin},
    journal={arXiv preprint arXiv:2411.11504},
    url={https://arxiv.org/abs/2411.11504}
    year={2024}
}

Star History

Star History Chart

About

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0