-
SkyWork
- ChengDu
- www.giantpandacv.com
-
how to optimize some algorithm in cuda.
-
Awesome-ML-SYS-Tutorial Public
Forked from zhaochenyang20/Awesome-ML-SYS-TutorialMy learning notes/codes for ML SYS.
-
Panzhihua-Mi-Yi-Pipa Public
If you want to purchase Panzhihua Mi Yi Pipa, please contact me.
-
-
-
DeepGEMM Public
Forked from deepseek-ai/DeepGEMMDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Cuda MIT License UpdatedFeb 27, 2025 -
ml-engineering Public
Forked from stas00/ml-engineeringMachine Learning Engineering Open Book
-
-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedJan 18, 2025 -
HunyuanVideo Public
Forked from Tencent-Hunyuan/HunyuanVideoHunyuanVideo: A Systematic Framework For Large Video Generation Model
Python Other UpdatedDec 20, 2024 -
-
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedNov 23, 2024 -
ao Public
Forked from pytorch/aoPyTorch native quantization and sparsity for training and inference
-
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
-
TiledCUDA Public
Forked from TiledTensor/TiledCUDATiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
C++ MIT License UpdatedSep 6, 2024 -
-
-
-
Image-processing-algorithm Public
paper implement
-
deepseekv2-profile Public
Forked from madsys-dev/deepseekv2-profileJupyter Notebook UpdatedMay 31, 2024 -
-
accelerate Public
Forked from huggingface/accelerate🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
-
nndeploy Public
Forked from nndeploy/nndeploynndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为内核,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
-
kineto Public
Forked from pytorch/kinetoA CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
HTML Other UpdatedApr 15, 2024 -
-
how to learn PyTorch and OneFlow
-
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
-
-
RWKV-CUDA Public
Forked from BlinkDL/RWKV-CUDAThe CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )
Cuda UpdatedJan 2, 2024 -
lm-evaluation-harness Public
Forked from EleutherAI/lm-evaluation-harnessA framework for few-shot evaluation of autoregressive language models.
Python MIT License UpdatedDec 22, 2023