👤 | Real Time Head Pose Estimation: Accurate head pose estimation using ResNet 18/34/50 and MobileNet V2/V3 models. Evaluate yaw, pitch, and roll with pre-trained weights for quick integration.

Python 40 5 Updated Mar 28, 2025

allenai / olmocr

Toolkit for linearizing PDFs for LLM datasets/training

Python 12,673 894 Updated May 30, 2025

huawei-noah / bolt

Bolt is a deep learning library with high performance and heterogeneous flexibility.

C++ 947 162 Updated Apr 11, 2025

TideDra / lmm-r1

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 761 46 Updated May 14, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 5,035 308 Updated May 11, 2025

openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,776 135 Updated Jan 17, 2025

RyanLiu112 / compute-optimal-tts

Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".

Python 261 21 Updated Feb 19, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 24,634 2,279 Updated May 28, 2025

Unakar / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python 2,348 155 Updated Mar 20, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 11,840 1,489 Updated Apr 24, 2025

unslothai / unsloth

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

Python 39,801 3,144 Updated Jun 1, 2025

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 10,733 773 Updated May 15, 2025

Deep-Agent / R1-V

Witness the aha moment of VLM with less than $3.

Python 3,707 286 Updated May 19, 2025

microsoft / rStar

Python 555 49 Updated Apr 15, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,865 667 Updated Jun 1, 2025

xlang-ai / aguvis

[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Python 306 19 Updated Mar 7, 2025

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,309 169 Updated Mar 28, 2025

suous / RecNeXt

RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations

Python 15 Updated Jan 8, 2025

PRIME-RL / PRIME

Scalable RL solution for advanced reasoning of language models

Python 1,588 91 Updated Mar 18, 2025

yeungchenwa / HDR

[AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents

Python 80 5 Updated Mar 20, 2025

ADaM-BJTU / OpenRFT

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Python 141 3 Updated Dec 24, 2024

PKU-YuanGroup / LLaVA-CoT

LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 1,996 76 Updated May 13, 2025

lqtrung1998 / mwp_ReFT

Python 539 63 Updated Jan 2, 2025

Byaidu / PDFMathTranslate

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译，支持 Google/DeepL/Ollama/OpenAI 等服务，提供 CLI/GUI/MCP/Docker/Zotero

Python 24,319 2,094 Updated May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zhaohui Wang Zhaohuii-Wang

Block or report Zhaohuii-Wang

Stars

pipecat-ai / pipecat

volcengine / verl

VLM-RL / Ocean-R1

IDEA-FinAI / ChartMoE

Liuziyu77 / Visual-RFT

HumanMLLM / R1-Omni

yakhyo / head-pose-estimation