8000 csuking / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View csuking's full-sized avatar

Block or report csuking

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 1,181 44 Updated Jun 14, 2025

One-shot Entropy Minimization

Python 137 7 Updated Jun 13, 2025

Open-source unified multimodal model

Python 4,083 314 Updated Jun 15, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,151 244 Updated Jun 12, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 10,986 785 Updated May 15, 2025

Official Repository of "Learning to Reason under Off-Policy Guidance"

Python 229 23 Updated Jun 3, 2025

TTRL: Test-Time Reinforcement Learning

Python 628 45 Updated Jun 6, 2025

An open source implementation of CLIP.

Python 11,943 1,111 Updated Jun 10, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 49,669 7,995 Updated Jun 15, 2025

Writing AI Conference Papers: A Handbook for Beginners

2,464 82 Updated Jun 5, 2025

Ongoing research training transformer models at scale

Python 12,581 2,838 Updated Jun 14, 2025

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 2 Updated May 10, 2025

Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning

Python 2,621 251 Updated Jun 10, 2025

Official Repo for Open-Reasoner-Zero

Python 1,962 104 Updated Jun 2, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 2,653 193 Updated Jun 14, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 1,971 84 Updated May 21, 2025

A fork to add multimodal model training to open-r1

Python 1,300 63 Updated Feb 8, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 9,465 1,292 Updated Jun 14, 2025

This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov

Jupyter Notebook 1,779 295 Updated May 21, 2025
Jupyter Notebook 947 113 Updated May 9, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,598 844 Updated Apr 29, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 8,098 702 Updated Jun 15, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,366 308 Updated May 13, 2025

The official Meta Llama 3 GitHub site

Python 28,779 3,396 Updated Jan 26, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,354 158 Updated Mar 20, 2025

A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.

Python 240 10 Updated Apr 15, 2025

Align Anything: Training All-modality Model with Feedback

Jupyter Notebook 3,970 494 Updated May 28, 2025

A curated list of reinforcement learning with human feedback resources (continually updated)

3,982 243 Updated Apr 30, 2025

Simple RL training for reasoning

Python 3,627 271 Updated Apr 10, 2025
Next
0