Stars
Official Implementation of RISE (Reinforcing Reasoning with Self-Verification)
A series of technical report on Slow Thinking with LLM
Fully open reproduction of DeepSeek-R1
basically all the things I used for this article
Benchmarking LLMs' Gaming Ability in Multi-Agent Environments
Benchmarking LLMs' Psychological Portrayal
Benchmarking LLMs' Emotional Alignment with Humans
MTTM: Metamorphic Testing for Textual Content Moderation Software
Multilingual safety benchmark for Large Language Models
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
[TOG 2024]StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
[Nature Reviews Bioengineering🔥] Application of Large Language Models in Medicine. A curated list of practical guide resources of Medical LLMs (Medical LLMs Tree, Tables, and Papers)
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
A Survey on the Honesty of Large Language Models
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
The offical implementation of 'FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant'
Everything about note management. All in Zotero.
[ICLR 2025] ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓