Stars
Development repository for the Triton language and compiler
Curated list of project-based tutorials
how to learn PyTorch and OneFlow
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A repo to hold community-submitted rattler-build recipes, to make community packages available via the modular-community prefix.dev channel
GNU toolchain for RISC-V, including GCC
Triton Documentation in Chinese Simplified / Triton 中文文档
Interactive roadmaps, guides and other educational content to help developers grow in their careers.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
A GPT-4 AI Tutor Prompt for customizable personalized learning experiences.
LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Ongoing research training transformer models at scale
A series of large language models trained from scratch by developers @01-ai
GPT-Fathom is an open-source and reproducible LLM evaluation suite, benchmarking 10+ leading open-source and closed-source LLMs as well as OpenAI's earlier models on 20+ curated benchmarks under al…
Hackable and optimized Transformers building blocks, supporting a composable construction.
A model compilation solution for various hardware
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends.