8000 lms-mt / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View lms-mt's full-sized avatar

Block or report lms-mt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

vits2 backbone with multilingual-bert

Python 8,493 1,211 Updated Jul 7, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,320 368 Updated Jul 8, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 7,207 716 Updated Jul 2, 2025

Character Animation (AnimateAnyone, Face Reenactment)

Python 3,413 272 Updated May 31, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,009 559 Updated Apr 11, 2025

LLM inference in C/C++

C++ 82,720 12,292 Updated Jul 8, 2025

MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs

Python 915 58 Updated May 15, 2023

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Python 37,407 3,279 Updated Aug 17, 2024

A library for calculating the FLOPs in the forward() process based on torch.fx

Python 118 7 Updated Apr 1, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,955 1,562 Updated Jul 8, 2025

Count the MACs / FLOPs of your PyTorch model.

Python 5,014 533 Updated Jul 8, 2024

This is an efficient cuda implementation of 2D depthwise convolution for large kernel, it can be used in Pytorch deep learning framework.

Cuda 10 1 Updated Sep 28, 2023

Optimize GEMM with tensorcore step by step

28 6 Updated Dec 17, 2023

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,757 457 Updated Oct 9, 2023

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 51,754 8,566 Updated Jul 8, 2025

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

Python 417 30 Updated Jun 26, 2025

Fast and memory-efficient exact attention

Python 18,242 1,789 Updated Jul 6, 2025

Annotations of the interesting ML papers I read

242 24 Updated Jul 3, 2025

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 475 37 Updated Mar 15, 2024

Transformer related optimization, including BERT, GPT

C++ 6,231 909 Updated Mar 27, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 7,809 1,301 Updated Jul 6, 2025

Development repository for the Triton language and compiler

MLIR 16,075 2,099 Updated Jul 8, 2025

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 43,884 5,494 Updated May 8, 2025

PyTorch Tutorial for Deep Learning Researchers

Python 31,459 8,225 Updated Aug 15, 2023

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023

Jupyter Notebook 2,903 630 Updated Mar 16, 2025

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook 14,382 3,352 Updated Aug 12, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,598 312 Updated Oct 19, 2024
0