-
NVIDIA
- Toronto, Canada
Highlights
Stars
- All languages
- AppleScript
- Arduino
- Assembly
- Bikeshed
- Blade
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Cuda
- Dart
- Dockerfile
- Erlang
- Go
- HTML
- Haskell
- Java
- JavaScript
- Jinja
- Jsonnet
- Jupyter Notebook
- Kotlin
- LiveScript
- Lua
- MLIR
- Makefile
- Markdown
- Mustache
- Nginx
- OCaml
- PHP
- PowerShell
- Processing
- Python
- QML
- Racket
- Red
- Ruby
- Rust
- SCSS
- Sass
- Scala
- Shell
- Smarty
- Swift
- TeX
- TypeScript
- VHDL
- Vim Script
- Vue
A holistic way of understanding how Llama and its components run in practice, with code and detailed documentation.
A Datacenter Scale Distributed Inference Serving Framework
Minimal reproduction of DeepSeek R1-Zero
Generative AI extensions for onnxruntime
An autoregressive character-level language model for making more things
Neural Networks: Zero to Hero
DSPy: The framework for programming—not prompting—language models
llama3.np is a pure NumPy implementation for Llama 3 model.
A VSCode extension to generate development environments using micromamba and conda-forge package repository
A book about compiling Racket and Python to x86-64 assembly
A Python framework for accelerated simulation, data generation and spatial computing.
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Development repository for the Triton language and compiler
Enabling CPython multi-core parallelism via subinterpreters.
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.
Fast and memory-efficient exact attention
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
Utilities for using Python's PEP 554 subinterpreters
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.
High accuracy RAG for answering questions from scientific documents with citations
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Some notes on things I find interesting and important.
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.