LLM Calculator

This script estimates the memory requirements and performance of Hugging Face models based on quantization levels. It fetches model parameters, calculates required memory, and analyzes performance with different RAM/VRAM configurations.

It supports windows and Linux, AMD, Intel, and Nvidia GPUs. You will need smi (cuda toolkit (?)) installed to recognise your Nvidia GPU.

Warning: The tool isn't tested outside of Linux+Nvidia, so results may be inaccurate. It's a rough estimate. It may or may not work with MultiGPU setups. If not, use -n to specify how many cards you have (assuming they're all the same). If you have an RTX 3070 and an RTX 3060 for example, you may need to specify -v and -b to be the average values of the two.

Flags

-b, --bandwidth: Override memory bandwidth (GB/s).
-n, --num-gpus: Number of GPUs (default is 1).
-v, --vram: Override VRAM amount per card in GB.

Example: python main.py -b 950 -n 2 -v 24

Dependencies

You'll need these to run it in python. 3.12.3 is what I wrote it in, but any modern version should work.

Install uv and run with:

uv run main.py

For AMD + Linux you'll need sudo apt install pciutils

Nvidia will need drivers, as long as nvidia-smi works this program should.

Intel needs lspci, dunno if that supports windows.

How It Works

Enter a Hugging Face model ID (e.g., microsoft/phi-4) to get its parameter count. The script fetches system RAM and VRAM specs. You can override them with flags. It analyzes memory requirements for several quantization schemes and estimates throughput (tk/s).

Hover over a cell to see how many layers you need to offload, and what context you can fit without KV cache quantizing.

Cli Output

Enter Hugging Face model ID (e.g., microsoft/phi-4): microsoft/phi-4
Model Parameters: 14.7B params (14.70B params)
Total RAM: 33.53 GB
VRAM: 8.00 GB, ~448.0GB/s
Estimated RAM Bandwidth: 64.00 GB/s

Analysis for each quantization level:

FP8:
Run Type: Partial offload
Memory Required: 16.43 GB
GPU Offload Percentage: 48.7%
Estimated tk/s: 5.38

Q6_K_S:
Run Type: Partial offload
Memory Required: 13.86 GB
GPU Offload Percentage: 57.7%
Estimated tk/s: 7.39

Q5_K_S:
Run Type: Partial offload
Memory Required: 11.84 GB
GPU Offload Percentage: 67.6%
Estimated tk/s: 10.63

Q4_K_M:
Run Type: Partial offload
Memory Required: 10.55 GB
GPU Offload Percentage: 75.8%
Estimated tk/s: 14.71

IQ4_XS:
Run Type: Partial offload
Memory Required: 9.64 GB
GPU Offload Percentage: 83.0%
Estimated tk/s: 19.92

Q3_K_M:
Run Type: KV cache offload
Memory Required: 8.90 GB
Estimated tk/s: 45.30

IQ3_XS:
Run Type: All in VRAM
Memory Required: 7.80 GB
Estimated tk/s: 57.45

IQ2_XS:
Run Type: All in VRAM
Memory Required: 6.14 GB
Estimated tk/s: 72.90

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LLMcalc.py		LLMcalc.py
README.md		README.md
gui.png		gui.png
main.py		main.py
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Calculator

Dependencies

How It Works

About

Uh oh!

Releases

Packages

Languages

banteg/LLMcalc

Folders and files

Latest commit

History

Repository files navigation

LLM Calculator

Dependencies

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages