10000 GitHub - StrategicalIT/llm-analysis: Latency and Memory Analysis of Transformer Models for Training and Inference
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

StrategicalIT/llm-analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Overview

Many formulas or equations are floating around in papers, blogs, etc., about how to calculate training or inference latency and memory for Large Language Models (LLMs) or Transformers. Rather than doing math on papers or typing in Excel sheets, let's automate the boring stuff with llm-analysis ⚙️!

Given the specified model, GPU, data type, and parallelism configurations, llm-analysis estimates the latency and memory usage of LLMs for training or inference. With llm-analysis, one can easily try out different training/inference setups theoretically, and better understand the system performance for different scenarios.

llm-analysis helps answer questions such as:

  • what batch size, data type, parallelism scheme to use to get a feasible (not getting OOM) and optimal (maximizing throughput with a latency constraint) setup for training or inference

  • time it takes with the given setup to do training or inference and the cost (GPU-hours)

  • how the latency/memory changes if using a different model, GPU type, number of GPU, data type for weights and activations, parallelism configuration (suggesting the performance benefit of modeling change, hardware improvement, quantization, parallelism, etc.)

  • To install this development build:

    pip install --upgrade git+https://github.com/strategicalit/llm-analysis.git@main

About

Latency and Memory Analysis of Transformer Models for Training and Inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%
0