8000 Boreas618 (Yi Sun) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Boreas618's full-sized avatar
:shipit:
:shipit:

Block or report Boreas618

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A high-performance library for compressed ndarrays, with a flexible computational engine

Python 142 24 Updated May 13, 2025

C++ extensions in PyTorch

Python 1,089 231 Updated Jan 24, 2025

[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Python 203 12 Updated Feb 22, 2025

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 668 24 Updated Apr 20, 2025

A lightweight design for computation-communication overlap.

Cuda 108 2 Updated May 6, 2025

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 85 6 Updated May 12, 2025

Tigon: A Distributed Database for a CXL Pod [OSDI '25]

C++ 10 3 Updated May 6, 2025

Ultra | Ultimate | Unified CCL

C++ 66 3 Updated Feb 14, 2025

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

Python 73 10 Updated Jul 25, 2023

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 455 56 Updated Sep 11, 2024

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 100 4 Updated Mar 13, 2024

MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

Jupyter Notebook 482 65 Updated Feb 3, 2025

matmul using AMX instructions

C++ 13 5 Updated May 7, 2024

a high performance library for building cache simulators

C++ 219 63 Updated May 10, 2025

Learning assembly for Linux x86_64

Assembly 2,855 336 Updated May 2, 2025

NVIDIA Linux open GPU with P2P support

C 1,136 109 Updated May 5, 2025

A static analyzer for Java, C, C++, and Objective-C

OCaml 15,204 2,033 Updated May 12, 2025

course notes for everyone

15 Updated May 6, 2025

Distributed Triton for Parallel Systems

Python 690 43 Updated May 12, 2025

Perplexity GPU Kernels

C++ 285 31 Updated May 13, 2025

VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or fu…

C++ 418 29 Updated Apr 24, 2025

Performance instrumentation and tracing for Android, Linux and Chrome

C++ 3,867 460 Updated May 14, 2025

Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"

Python 40 6 Updated Nov 24, 2024

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

C++ 8,177 901 Updated May 13, 2025

A Memory-Disaggregated Managed Runtime.

66 6 Updated Aug 28, 2021

Reference counting in c

C 33 4 Updated Apr 14, 2023

This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered Memory Systems

C 12 Updated Apr 1, 2025

Simple, portable, and self-contained stacktrace library for C++11 and newer

C++ 949 106 Updated May 14, 2025

Window management made elegant.

Swift 8,274 173 Updated Apr 14, 2025
Next
0