LLM API Benchmark Tool

Overview

The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoints across different concurrency levels. This tool provides in-depth insights into API throughput, generation speed, and token processing capabilities.

Key Features

🚀 Dynamic Concurrency Testing
📊 Comprehensive Performance Metrics
🔍 Flexible Configuration
📝 Markdown Result Reporting
🌐 Compatible with Any OpenAI-Like API
📏 Arbitrary Length Dynamic Input Prompt

Performance Metrics Measured

Generation Throughput
- Measures tokens generated per second
- Calculates across multiple concurrency levels
Prompt Throughput
- Analyzes input token processing speed
- Helps understand API's prompt handling efficiency
Time to First Token (TTFT)
- Measures initial response latency
- Provides both minimum and maximum TTFT
- Critical for understanding real-time responsiveness

Example Output

Input Tokens: 45
Output Tokens: 512
Test Model: Qwen2.5-7B-Instruct-AWQ
Latency: 2.20 ms

Concurrency	Generation Throughput (tokens/s)	Prompt Throughput (tokens/s)	Min TTFT (s)	Max TTFT (s)
1	58.49	846.81	0.05	0.05
2	114.09	989.94	0.08	0.09
4	222.62	1193.99	0.11	0.15
8	414.35	1479.76	0.11	0.24
16	752.26	1543.29	0.13	0.47
32	653.94	1625.07	0.14	0.89

Usage

Quick Start Guide

Minimal Configuration

Linux:

./llmapibenchmark_linux_amd64 -base_url=https://your-api-endpoint.com/v1

Windows:

llmapibenchmark_windows_amd64.exe -base_url=https://your-api-endpoint.com/v1

Full Configuration

Linux:

./llmapibenchmark_linux_amd64 \
  -base_url=https://your-api-endpoint.com/v1 \
  -apikey=YOUR_API_KEY \
  -model=gpt-3.5-turbo \
  -concurrency=1,2,4,8,16 \
  -max_tokens=512 \
  -numWords=513 \
  -prompt="Your custom prompt here"

Windows:

llmapibenchmark_windows_amd64.exe ^
  -base_url=https://your-api-endpoint.com/v1 ^
  -apikey=YOUR_API_KEY ^
  -model=gpt-3.5-turbo ^
  -concurrency=1,2,4,8,16 ^
  -max_tokens=512 ^
  -numWords=513 ^
  -prompt="Your custom prompt here"

Command-Line Parameters

Parameter	Description	Default	Required
`-base_url`	Base URL for LLM API endpoint	Empty (MUST be specified)	Yes
`-apikey`	API authentication key	None	No
`-model`	Specific AI model to test	Automatically discovers first available model	No
`-concurrency`	Comma-separated concurrency levels to test	`1,2,4,8,16,32,64,128`	No
`-max_tokens`	Maximum tokens to generate per request	`512`	No
`-numWords`	Number of words for input prompt	Not set (optional)	No
`-prompt`	Text prompt for generating responses	`"Write a long story, no less than 10,000 words, starting from a long, long time ago."`	No

Output

The tool generates:

Console-based real-time results
Markdown file (API_Throughput_{ModelName}.md) with detailed results

Result File Columns

Concurrency: Number of concurrent requests
Generation Throughput: Tokens generated per second
Prompt Throughput: Input token processing speed
Min TTFT: Minimum time to first token
Max TTFT: Maximum time to first token

Best Practices

Test with various prompt lengths and complexities
Compare different models
Monitor for consistent performance
Be mindful of API rate limits
Use -numWords to control input length

Limitations

Requires active API connection
Results may vary based on network conditions
Does not simulate real-world complex scenarios

Disclaimer

This tool is for performance analysis and should be used responsibly in compliance with API provider's usage policies.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cmd		cmd
internal		internal
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM API Benchmark Tool

Overview

Key Features

Performance Metrics Measured

Example Output

Usage

Quick Start Guide

Minimal Configuration

Full Configuration

Command-Line Parameters

Output

Result File Columns

Best Practices

Limitations

Disclaimer

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

Yoosu-L/llmapibenchmark

Folders and files

Latest commit

History

Repository files navigation

LLM API Benchmark Tool

Overview

Key Features

Performance Metrics Measured

Example Output

Usage

Quick Start Guide

Minimal Configuration

Full Configuration

Command-Line Parameters

Output

Result File Columns

Best Practices

Limitations

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages