- Qingdao
Stars
A survey on harmful fine-tuning attack for large language model
Physics Master is a model fine-tuned from llama3-8B-Instruct. It can answer your physics question!
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
DeepRAG: Thinking to Retrieve Step by Step for Large Language Models
Graphical Java application for managing BibTeX and BibLaTeX (.bib) databases
Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Understanding R1-Zero-Like Training: A Critical Perspective
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
A light-weight tool for evaluating LLMs in rule-based ways.
An Azure Function solution to crawl through all of your image files in GitHub and losslessly compress them. This will make the file size go down, but leave the dimensions and quality untouched. Onc…
antgroup / ant-ray
Forked from ray-project/rayRay is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay is forked from ray, offering incremental new features on top …
Simple, modern and fast file watching and code reload in Python.
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
A Datacenter Scale Distributed Inference Serving Framework
Debugging torch distributed program
Synchronized viewing, theater, live streaming, video
ocss884 / verl
Forked from volcengine/verlveRL: Volcano Engine Reinforcement Learning for LLM
Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).
Efficient Triton Kernels for LLM Training
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
Official Repo for Open-Reasoner-Zero