8000 troore (Xuechao Wei) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View troore's full-sized avatar
🎩
🎩

Block or report troore

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,937 209 Updated May 15, 2025
Fortran 10 1 Updated Sep 14, 2023

GNNear: Accelerating Full-Batch Training of Graph NeuralNetworks with Near-Memory Processing

C++ 13 1 Updated Sep 15, 2022

The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering

51 5 Updated Aug 11, 2024

LLM Inference analyzer for different hardware platforms

Jupyter Notebook 66 14 Updated Apr 30, 2025

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 100 4 Updated Mar 13, 2024

Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .

C++ 116 112 Updated May 13, 2025

LLM inference in C/C++

C++ 80,417 11,794 Updated May 17, 2025

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 359 135 Updated May 7, 2025

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 410 46 Updated Apr 19, 2025

This is the FreePDK45 V1.4 Process Development Kit for the 45 nm technology

HTML 24 1 Updated Feb 22, 2021

Serving multiple LoRA finetuned LLM as one

Python 1,059 48 Updated May 8, 2024

A benchmark suite for xillybus

VHDL 6 1 Updated Feb 21, 2016

An integrated power, area, and timing modeling framework for multicore and manycore architectures

C++ 185 72 Updated Aug 8, 2020

Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.

Python 1,559 100 Updated May 7, 2025

Hardware utilities with Spinal HDL

Scala 1 Updated Feb 22, 2022

Provide Python access to the NVML library for GPU diagnostics

Python 235 33 Updated Dec 2, 2024

Cavs: An Efficient Runtime System for Dynamic Neural Networks

C++ 14 3 Updated Sep 18, 2020

Yinghan's Code Sample

Cuda 327 58 Updated Jul 25, 2022

RISC-V Instruction Set Manual

TeX 4,067 705 Updated May 12, 2025

Deep learning toolkit-enabled VLSI placement

C++ 802 219 Updated Apr 15, 2025

Bridging polyhedral analysis tools to the MLIR framework

C++ 111 22 Updated Sep 9, 2023

Polyhedral High-Level Synthesis in MLIR

C++ 30 8 Updated Mar 17, 2023

Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.

Python 1,829 266 Updated Feb 11, 2024

Research and development for optimizing transformers

Python 126 17 Updated Feb 16, 2021

Reproduce Fast ConvNets @CVPR 2020

Python 1 Updated Sep 10, 2021

[FPGA 2021, Best Paper Award] An automated floorplanning and pipelining tool for Vivado HLS.

C++ 122 26 Updated Jan 3, 2023

AutoSA: Polyhedral-Based Systolic Array Compiler

C++ 221 33 Updated Dec 8, 2022
Next
0