8000 troore (Xuechao Wei) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View troore's full-sized avatar
🎩
🎩

Block or report troore

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
45 results for source starred repositories
Clear filter

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,018 10000 220 Updated Jun 10, 2025
Fortran 10 1 Updated Sep 14, 2023

GNNear: Accelerating Full-Batch Training of Graph NeuralNetworks with Near-Memory Processing

C++ 13 1 Updated Sep 15, 2022

The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering

52 5 Updated Aug 11, 2024

LLM Inference analyzer for different hardware platforms

Jupyter Notebook 72 17 Updated May 28, 2025

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 100 4 Updated Mar 13, 2024

Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .

C++ 122 113 Updated Jun 11, 2025

LLM inference in C/C++

C++ 81,646 12,071 Updated Jun 11, 2025

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 369 137 Updated May 7, 2025

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 426 50 Updated Apr 19, 2025

This is the FreePDK45 V1.4 Process Development Kit for the 45 nm technology

HTML 25 1 Updated Feb 22, 2021

Serving multiple LoRA finetuned LLM as one

Python 1,065 52 Updated May 8, 2024

A benchmark suite for xillybus

VHDL 6 1 Updated Feb 21, 2016

An integrated power, area, and timing modeling framework for multicore and manycore architectures

C++ 186 73 Updated Aug 8, 2020

Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.

Python 1,565 101 Updated May 7, 2025

Hardware utilities with Spinal HDL

Scala 1 Updated Feb 22, 2022

Provide Python access to the NVML library for GPU diagnostics

Python 238 33 Updated Dec 2, 2024

Cavs: An Efficient Runtime System for Dynamic Neural Networks

C++ 14 3 Updated Sep 18, 2020

Yinghan's Code Sample

Cuda 328 58 Updated Jul 25, 2022

RISC-V Instruction Set Manual

TeX 4,113 715 Updated Jun 11, 2025

Deep learning toolkit-enabled VLSI placement

C++ 822 226 Updated Apr 15, 2025

Bridging polyhedral analysis tools to the MLIR framework

C++ 112 22 Updated Sep 9, 2023

Polyhedral High-Level Synthesis in MLIR

C++ 33 8 Updated Mar 17, 2023

Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.

Python 1,832 267 Updated Feb 11, 2024

Research and development for optimizing transformers

Python 127 17 Updated Feb 16, 2021

Reproduce Fast ConvNets @CVPR 2020

Python 1 Updated Sep 10, 2021

[FPGA 2021, Best Paper Award] An automated floorplanning and pipelining tool for Vivado HLS.

C++ 122 26 Updated Jan 3, 2023

AutoSA: Polyhedral-Based Systolic Array Compiler

C++ 221 33 Updated Dec 8, 2022
Next
0