8000 irasin (Sin) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View irasin's full-sized avatar

Block or report irasin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Simple MPI implementation for prototyping or learning

C 259 9 Updated Jun 27, 2025

所有小初高、大学PDF教材。

Roff 44,323 9,899 Updated May 18, 2025

C++ extensions in PyTorch

Python 1,112 237 Updated Jul 8, 2025

A powerful and artistic UI library based on PyQt5,基于 PyQt5 的UI框架,灵动、优雅而轻便

Python 997 96 Updated Jul 9, 2025
11 5 Updated Apr 27, 2013

通义千问VLLM推理部署DEMO

Python 587 86 Updated Mar 28, 2024

Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.

C++ 261 51 Updated Jan 13, 2025

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,380 562 Updated Feb 15, 2025

We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…

C++ 183 11 Updated Jan 28, 2025

C++ Tip Of The Week

1,624 75 Updated May 20, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 365 47 Updated Sep 21, 2024

Tile primitives for speedy kernels

Cuda 2,510 159 Updated Jul 7, 2025

Random for modern C++ with convenient API

C++ 940 84 Updated Jul 9, 2025

C++20 μ(micro)/Unit Testing framework

C++ 1,348 134 Updated Jul 7, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,565 664 Updated Aug 18, 2024

the resources about the application based on LLM with RAG pattern

1,471 93 Updated Jan 22, 2025

一个基于langchain实现RAG的简单示例

Jupyter Notebook 523 80 Updated Jun 7, 2025

A comprehensive guide to building RAG-based LLM applications for production.

Jupyter Notebook 1,800 250 Updated Aug 2, 2024

A simple high performance CUDA GEMM implementation.

Cuda 385 42 Updated Jan 4, 2024
Cuda 137 17 Updated Mar 18, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 5,425 573 Updated Jun 29, 2025

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 474 45 Updated Apr 15, 2025

Benchmark code for the "Online normalizer calculation for softmax" paper

Cuda 95 9 Updated Jul 27, 2018

collection of benchmarks to measure basic GPU capabilities

C++ 390 57 Updated Feb 11, 2025

C++ project template with unit-tests, documentation, ci-testing and workflows.

CMake 277 101 Updated Jul 15, 2024

An extension library of WMMA API (Tensor Core API)

Cuda 99 16 Updated Jul 12, 2024
Next
0