GitHub - burrsutter/llama-stack-tutorial

Original docs

https://llama-stack.readthedocs.io/en/latest/getting_started/

Official Tutorial

https://rh-aiservices-bu.github.io/llama-stack-tutorial/modules/index.html

Note: Most of these examples use the "client-server" approach. There is also a library mode that is some of the examples but commented out.

Ollama server

Terminal 1

ollama serve

Terminal 2

Use the "keepalive" parameter otherwise ollama quickly returns that memory back to the host

ollama run llama3.2:3b-instruct-fp16 --keepalive 60m

Note: this blocks the terminal as ollama run allows you to chat with the model.

Use

/bye

And then ollama ps to see if the model is still in memory

Llama Stack Server

Terminal 3

There is some repetition below as I find different examples that would like slightly different env vars

export LLAMA_STACK_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export LLAMA_STACK_PORT=8321
export LLAMA_STACK_SERVER=http://localhost:$LLAMA_STACK_PORT
export LLAMA_STACK_ENDPOINT=$LLAMA_STACK_SERVER
export LLAMA_STACK_ENDPOINT_OPENAI=$LLAMA_STACK_ENDPOINT/v1/openai/v1

Terminal 3

uv approach

start clean

uv cache clean

rm -rf .venv

rm -rf /Users/bsutter/.llama/distributions/ollama/

brew install python@3.12
uv venv .venv --python "/opt/homebrew/bin/python3.12"
source .venv/bin/activate

double check your python version

python --version

Check out requirements.txt and install the dependencies

uv pip install -r requirements.txt

Note: requirements.txt dependencies are NOT versioned in most cases. Trying to stay on latest/greatest.

uv pip list | grep llama
llama_stack                              0.2.13
llama_stack_client                       0.2.13
ollama                                   0.5.1

uv run --with llama-stack llama stack build --template ollama --image-type venv --run

docker, podman approach

Reset local data used by Llama Stack Server if using docker or podman.

rm -rf ~/.llama
mkdir -p ~/.llama
ls ~/.llama

docker run -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  llamastack/distribution-ollama \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=$LLAMA_STACK_MODEL \
  --env OLLAMA_URL=http://host.docker.internal:11434

or

podman run -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  --env INFERENCE_MODEL=$LLAMA_STACK_MODEL \
  --env OLLAMA_URL=http://host.containers.internal:11434 \
  llamastack/distribution-ollama \
  --port $LLAMA_STACK_PORT

You may need to start your podman backend

podman machine start

Client library CLI

Terminal 4

source .venv/bin/activate

llama-stack-client configure --endpoint $LLAMA_STACK_SERVER

> Enter the API key (leave empty if no key is needed):

Hit Enter

llama-stack-client models list

Available Models

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ model_type   ┃ identifier                           ┃ provider_resource_id         ┃ metadata                          ┃ provider_id           ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ llm          │ meta-llama/Llama-3.2-3B-Instruct     │ llama3.2:3b-instruct-fp16    │                                   │ ollama                │
├──────────────┼──────────────────────────────────────┼──────────────────────────────┼───────────────────────────────────┼───────────────────────┤
│ embedding    │ all-MiniLM-L6-v2                     │ all-MiniLM-L6-v2             │ {'embedding_dimension': 384.0}    │ sentence-transformers │
└──────────────┴──────────────────────────────────────┴──────────────────────────────┴───────────────────────────────────┴───────────────────────┘

Total models: 2

llama-stack-client \
  inference chat-completion \
  --message "hello, what model are you?"

ChatCompletionResponse(
    completion_message=CompletionMessage(
        content='Hello! I\'m an AI designed to assist and communicate with users in a helpful and informative way. My primary function is to
provide information, answer questions, and engage in conversation on a wide range of topics.\n\nI\'m a type of artificial intelligence (AI) called
a large language model, which means I\'ve been trained on a massive dataset of text from various sources, including books, articles, research
papers, and online conversations. This training allows me to understand and generate human-like language, including grammar, syntax, and
vocabulary.\n\nMy architecture is based on a transformer model, which is a type of neural network designed specifically for natural language
processing tasks like language translation, question-answering, and text generation.\n\nI don\'t have a specific name or brand, but I\'m often
referred to as a "chatbot" or a "conversational AI." My goal is to provide accurate and helpful information, while also being friendly and
engaging in conversation. How can I assist you today?',
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None
)

curl

I use jq to parse the JSON returned by the curl command. It is optional, your eyeballs can parse the JSON.

brew install jq

Using Llama Stack API endpoint

curl -sS $LLAMA_STACK_SERVER/v1/models -H "Content-Type: application/json" | jq -r '.data[].identifier'

Results:

meta-llama/Llama-3.2-3B-Instruct
all-MiniLM-L6-v2

Using OpenAI API endpoint

curl -sS $LLAMA_STACK_ENDPOINT_OPENAI/models | jq -r '.data[].id'

Chat completions using Llama Stack API

curl -sS $LLAMA_STACK_SERVER/v1/inference/chat-completion \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d "{
     \"model_id\": \"$LLAMA_STACK_MODEL\",
     \"messages\": [{\"role\": \"user\", \"content\": \"what model are you?\"}],
     \"temperature\": 0.0
   }" | jq -r '.completion_message | select(.role == "assistant") | .content'

Chat completions using OpenAI API

API_KEY="none"
MODEL_NAME="meta-llama/Llama-3.2-3B-Instruct"
QUESTION="What model are you?"

curl -sS $LLAMA_STACK_ENDPOINT_OPENAI/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d "{
     \"model\": \"$MODEL_NAME\",
     \"messages\": [{\"role\": \"user\", \"content\": \"$QUESTION\"}],
     \"temperature\": 0.0
   }" | jq -r '.choices[0].message.content'

Python

To prove connectivity and find out more about the capabilities of the server

Code originally from https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#run-inference-with-python-sdk

Test of setup

python 0-test-remote-client.py

Lots of configuration output and then a haiku

Here is a haiku about coding:

Lines of code unfold
Logic flows through digital night
Beauty in the bits
<
F438
/code>


Test OpenAI API compatibility
Note: "v1/openai/v1" appended to the Llama Stack server host/port
python 0-test-remote-client-openai.py
List of models

python 1-models.py
--- Available models: ---
- all-MiniLM-L6-v2
- meta-llama/Llama-3.2-3B-Instruct

Add a bigger model

Make sure ollama has the model running
As of 0.2.2, the --keepalive is no longer required.  However, you do need to ollama pull
ollama run llama3.1:8b-instruct-fp16 --keepalive 60m
python 1-models-add.py
llama-stack-client models list
Available Models

┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ model_type ┃ identifier                       ┃ provider_resource_id      ┃ metadata                       ┃ provider_id          ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ llm        │ meta-llama/Llama-3.2-3B-Instruct │ llama3.2:3b-instruct-fp16 │                                │ ollama               │
├────────────┼──────────────────────────────────┼───────────────────────────┼────────────────────────────────┼──────────────────────┤
│ embedding  │ all-MiniLM-L6-v2                 │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │ sentence-transforme… │
├────────────┼──────────────────────────────────┼───────────────────────────┼────────────────────────────────┼──────────────────────┤
│ llm        │ meta-llama/Llama-3.1:8B-Instruct │ llama3.1:8b-instruct-fp16 │ {'description':                │ ollama               │
│            │                                  │                           │ 'llama3.1:8b-instruct-fp16 via │                      │
│            │                                  │                           │ ollama'}                       │                      │
└────────────┴──────────────────────────────────┴───────────────────────────┴────────────────────────────────┴──────────────────────┘

Total models: 3

We will add the Guard model later for shields/safety
Delete a model

Note: we do use the 8b model later, this is just to exercise the API
python 1-models-delete.py
simple chat-completions example

python 2-chat-completions.py
python 2-chat-completions-weather.py
Please note that I'm a text-based AI model and do not have the ability to access current information in real-time. If you need the most up-to-date temperature, please try one of the above options.

Because "what's the weather?" is the way you show off tools and MCP later on
python 2-chat-completions-logger.py

Use of dotenv and logger. A bit more advanced, sprinkled throughout some of the following examples.  Also shows off a hallunication
Burr Sutter is an American entrepreneur and the co-founder of GitHub, a web-based platform for version control and collaboration on software development projects. He co-founded GitHub in 2008 with Tom Preston-Werner and Chris Wanstrath.

OpenAI API compatibility

export API_KEY=none
export MODEL_NAME="meta-llama/Llama-3.2-3B-Instruct"
export INFERENCE_SERVER_URL=$LLAMA_STACK_SERVER/v1/openai/v1
python 2-chat-completions-weather-openai.py
Structured Output

Uses Pydantic model
python 3-structured-output.py 
python 3-structured-output-leopard.py
Structured output means you can get formatted responses from the LLM that allow for programmatic control
With OpenAI API
python 3-structured-output-openai.py
Tools

Using tools, JSON declaration
export API_KEY=none
export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
export INFERENCE_SERVER_URL=$LLAMA_STACK_SERVER/v1/openai/v1
python 4-tools-weather-openai.py
Get an API KEY
https://app.tavily.com/home
export TAVILY_SEARCH_API_KEY=your-key
Restart Llama Stack server
Add meta-llama/Llama-3.1-8B-Instruct if you have not already
python 4-tools-tavily.py
Proves you have connectivity to tavily
python list-tools.py
Agents

python 5-basic-agent.py

Agents with Tools

Get an API KEY
https://app.tavily.com/home
export TAVILY_SEARCH_API_KEY=your-key
Add meta-llama/Llama-3.1-8B-Instruct if you have not already
python 1-models-add.py
python 1-models.py
--- Available models: ---
all-MiniLM-L6-v2 - ollama - all-minilm:latest
meta-llama/Llama-3.1-8B-Instruct - ollama - llama3.1:8b-instruct-fp16
meta-llama/Llama-3.2-3B-Instruct - ollama - llama3.2:3b-instruct-fp16
Note: you do not need both 3B and 8B normally.
python 5-basic-agent-websearch-tool.py
If it works it should result in something like the following.
The winner of the last Super Bowl was the Philadelphia Eagles who defeated the Kansas City Chiefs with a score of 40-22 in Super Bowl LIX.

With Tavily Search (already pre-registered)
export TAVILY_SEARCH_API_KEY=your-key
And there is a test-tavily.py to test your key/connectivity
python 5-basic-agent-tavily-tool.py

Note: seems to perform the web search but does NOT provide a "good" answer.  You should also notice the logs indicate it is attempting to use the brave search yet needs the tavily api key.
python 5-basic-agent-brave-tool.py

RAG

If the version you need is not yet on pypi.org, install client directly from github
If you need to clean your previously downloaded pips:
rm -rf .venv
python3.11 -m venv .venv
source .venv/bin/activate

# pip install git+https://github.com/meta-llama/llama-stack-client-python.git
pip install llama-stack-client
pip install llama-stack
pip install aiosqlite
pip install ollama
pip install openai
pip install datasets
pip install opentelemetry-instrumentation
pip install opentelemetry-exporter-otlp
pip install faiss-cpu
pip install mcp
pip install autoevals
# pip install opentelemetry-exporter-prometheus

python 5-basic-rag.py

Shields (Safety, Guardrails)

ollama pull llama-guard3:8b-q4_0

ollama run llama-guard3:8b-q4_0 --keepalive 60m

ollama ps

NAME                         ID              SIZE      PROCESSOR    UNTIL
llama3.2:3b-instruct-fp16    195a8c01d91e    8.6 GB    100% GPU     59 minutes from now
llama-guard3:8b-q4_0         d8d7fb8dfa56    6.7 GB    100% GPU     59 minutes from now
llama3.1:8b-instruct-fp16    4aacac419454    17 GB     100% GPU     59 minutes from now

If the model is not alive on ollama, you will get failures.  Llama Stack server startup looks for the already running ollama models.
You MAY need to shut-down any previously running Llama Stack server
docker ps

note: your container id will be different
docker stop fc3eae32f44c

but starting/restarting clean is often a good idea
rm -rf ~/.llama
mkdir -p ~/.llama

docker run -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  llamastack/distribution-ollama \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=$LLAMA_STACK_MODEL \
  --env OLLAMA_URL=http://host.docker.internal:11434

Register the guard/guardian model
python 1-models-add-guard.py

Register the shield and attempt to use it
python 6-shield-content.py

See the registered shields
python list-shields.py

Shield(
│   identifier='content_safety',
│   provider_id='llama-guard',
│   provider_resource_id='Llama-Guard-3-8B',
│   type='shield',
│   params={}
)

Now an agent + shield
python 6-agent-shield.py

Two of the four messages will cause violations
The violation codes
https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/safety/llama_guard/llama_guard.py#L54
MCP Servers

The file system MCP server is one of the easiest, get it up and running in a terminal.
New terminal to run the MCP server process
cd mcp-servers/node-mcp-server-math

See its readme.md
npx -y supergateway --port 8002 --stdio "node index.mjs"

Register the toolgroup
Note: if the MCP server is not up/on, the registration will often fail with a 500 error.
curl -X POST -H "Content-Type: application/json" --data '{ "provider_id" : "model-context-protocol", "toolgroup_id" : "mcp::my-node-server-math", "mcp_endpoint" : { "uri" : "http://host.docker.internal:8002/sse"}}' http://localhost:8321/v1/toolgroups

See if it is registered
python providers-tools-list.py

python 7-mcp-client-node-server.py

In this response, I used the function `add` to add 2 and 2. The result is 4.

Go for a 2nd MCP Server
cd mcp-servers/node-mcp-server-other

review readme.md to startup
curl -X POST -H "Content-Type: application/json" --data '{ "provider_id" : "model-context-protocol", "toolgroup_id" : "mcp::my-node-server-other", "mcp_endpoint" : { "uri" : "http://host.docker.internal:8004/sse"}}' http://localhost:8321/v1/toolgroups

python 7-mcp-client-node-server-other.py

inference> {"function": "fetch_customer_details", "parameters": {"customer_id": "C100"}}<|python_tag|>{"function": "fetch_customer_details", "parameters": {"customer_id": "C100"}}

Web page fetcher tool

Included in the MCP python-sdk
git clone https://github.com/modelcontextprotocol/python-sdk

cd python-sdk/examples/servers/simple-tool

review README.md
export MCP_PORT=8005
uv run mcp-simple-tool --transport sse --port $MCP_PORT

INFO:     Started server process [84213]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8005 (Press CTRL+C to quit)

curl -X POST -H "Content-Type: application/json" --data '{ "provider_id" : "model-context-protocol", "toolgroup_id" : "mcp::mcp-website-fetcher", "mcp_endpoint" : { "uri" : "http://host.docker.internal:8005/sse"}}' http://localhost:8321/v1/toolgroups

python 7-mcp-client-web-page-fetcher.py

Vision

VLM use cases - image to text:

how many objects in an image
what is the total amount on an invoice
hand-writing recognition
generating marketing copy for a new product

ollama run llama3.2-vision:11b --keepalive 60m

Uses about 12GB of VRAM (or unified RAM Mac M1,3,4)
model = client.models.register(    
    model_id="meta-llama/Llama-3.2-vision-11B",
    model_type="llm",
    provider_id="ollama",
    provider_model_id="llama3.2-vision:11b",
    metadata={"description": "llama3.2-vision:11b via ollama"}
)

Describe an image

export LLAMA_STACK_VISION_MODEL="meta-llama/Llama-3.2-vision-11B"
# OR
export LLAMA_STACK_VISION_MODEL="ibm/Granite-3.2-vision-2B"

python 8-chat-completions-vision-1.py

How many dogs

python 8-chat-completions-vision-2.py

Invoice Total and customer address

python 8-chat-completions-vision-3.py

Patient Intake: Hand-writing

python 8-chat-completions-vision-4.py

Marketing copy creation

python 8-chat-completions-vision-5.py

Qwen2.5-VL-7B-Instruct

https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
Library Mode

Using Llama Stack as an embedded library/framework instead of a remote API server.
Still using ollama
https://llama-stack.readthedocs.io/en/latest/distributions/importing_as_library.html
python3.11 -m venv .venv

source .venv/bin/activate

pip install llama-stack
pip install aiosqlite
pip install ollama
pip install openai
pip install datasets
pip install opentelemetry-instrumentation
pip install opentelemetry-exporter-prometheus
pip install opentelemetry-exporter-otlp
pip install faiss-cpu
pip install mcp
pip install autoevals
pip install bwrap

python 0-test-library-client.py

Streamlit GUI

cd streamlit-chat-gui

review readme.md

Playground

https://llama-stack.readthedocs.io/en/latest/playground/index.html
export LLAMA_STACK_ENDPOINT=http://localhost:8321

git clone https://github.com/meta-llama/llama-stack

cd llama-stack/llama_stack/distribution/ui

pip install -r requirements.txt

pip install llama_stack

streamlit run app.py

Check out the README.md in that directory for more ideass
ToDos

podman
If I run llama-stack in podman, I use this as the address of my mcp-server:  http://host.containers.internal:8000/sse
get weather via agent API and decorator 7
MCP server with sqlite database
https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide
Shields output
meta-llama/llama-stack#1419
https://llama-stack.readthedocs.io/en/latest/building_applications/rag.html
versus
https://github.com/burrsutter/python-plain-agentic-examples/tree/main/rag
Working Tavily+Agent
Working Brave+Agent
PatternFly Chatbot
https://github.com/patternfly/chatbot
More MCP examples
https://towardsdatascience.com/clear-intro-to-mcp/
https://redhat-internal.slack.com/archives/C08CD63RDLG/p1743181170314839
meta-llama/llama-stack#1354
Clean Docker/Podman

docker kill $(docker ps -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)
docker system prune -a --volumes

podman kill $(podman ps -q)
podman rm $(podman ps -a -q)
podman rmi $(podman images -q)
podman system prune -a --volumes

Name		Name	Last commit date
Latest commit History 57 Commits
images		images
langgraph		langgraph
mcp-servers		mcp-servers
streamlit-chat-gui		streamlit-chat-gui
.env-example		.env-example
.gitignore	.gitignore
0-test-library-mode.py		0-test-library-mode.py
0-test-remote-client-openai.py		0-test-remote-client-openai.py
0-test-remote-client.py		0-test-remote-client.py
1-models-add-delete.py		1-models-add-delete.py
1-models-add-granite.py		1-models-add-granite.py
1-models-add-guard-granite.py		1-models-add-guard-granite.py
1-models-add-guard.py		1-models-add-guard.py
1-models-add-vision-granite-vllm.py		1-models-add-vision-granite-vllm.py
1-models-add-vision-granite.py		1-models-add-vision-granite.py
1-models-add-vision.py		1-models-add-vision.py
1-models-add.py		1-models-add.py
1-models-delete.py		1-models-delete.py
1-models.py		1-models.py
2-chat-completions-burr.py		2-chat-completions-burr.py
2-chat-completions-leopard.py		2-chat-completions-leopard.py
2-chat-completions-logger.py		2-chat-completions-logger.py
2-chat-completions-weather-openai.py		2-chat-completions-weather-openai.py
2-chat-completions-weather.py		2-chat-completions-weather.py
2-chat-completions.py		2-chat-completions.py
3-structured-output-leopard.py		3-structured-output-leopard.py
3-structured-output-openai-beta.py		3-structured-output-openai-beta.py
3-structured-output-openai-not-beta-response-format.py		3-structured-output-openai-not-beta-response-format.py
3-structured-output-openai.py		3-structured-output-openai.py
3-structured-output.py		3-structured-output.py
4-tools-tavily.py		4-tools-tavily.py
4-tools-weather-openai.py		4-tools-weather-openai.py
5-basic-agent-brave-tool.py		5-basic-agent-brave-tool.py
5-basic-agent-tavily-tool.py		5-basic-agent-tavily-tool.py
5-basic-agent-websearch-tool.py		5-basic-agent-websearch-tool.py
5-basic-agent.py		5-basic-agent.py
5-basic-rag.py		5-basic-rag.py
6-agent-shield.py		6-agent-shield.py
6-shield-content-granite.py		6-shield-content-granite.py
6-shield-content.py		6-shield-content.py
7-mcp-client-node-server-other.py		7-mcp-client-node-server-other.py
7-mcp-client-node-server.py		7-mcp-client-node-server.py
7-mcp-client-python-server.py		7-mcp-client-python-server.py
7-mcp-client-web-page-fetcher.py		7-mcp-client-web-page-fetcher.py
8-chat-completions-vision-1.py		8-chat-completions-vision-1.py
8-chat-completions-vision-2.py		8-chat-completions-vision-2.py
8-chat-completions-vision-3.py		8-chat-completions-vision-3.py
8-chat-completions-vision-3a.py		8-chat-completions-vision-3a.py
8-chat-completions-vision-4.py		8-chat-completions-vision-4.py
8-chat-completions-vision-5.py		8-chat-completions-vision-5.py
clean.sh		clean.sh
image-encoding.py		image-encoding.py
list-shields.py		list-shields.py
list-tools.py		list-tools.py
mcp-servers-register.sh		mcp-servers-register.sh
mcp-servers-unregister.sh		mcp-servers-unregister.sh
providers-tools-list.py		providers-tools-list.py
readme.md		readme.md	< 9E81 div class="Skeleton Skeleton--text">
requirements.txt		requirements.txt
test-brave.py		test-brave.py
test-tavily.py		test-tavily.py

burrsutter/llama-stack-tutorial

Folders and files

Latest commit

History

Repository files navigation

Ollama server

Llama Stack Server

uv approach

docker, podman approach

Client library CLI

curl

Python

Test of setup

List of models

Add a bigger model

Delete a model

simple chat-completions example

OpenAI API compatibility

Structured Output

Tools

Agents

Agents with Tools

RAG

Shields (Safety, Guardrails)

MCP Servers

Web page fetcher tool

Vision

Describe an image

How many dogs

Invoice Total and customer address

Patient Intake: Hand-writing

Marketing copy creation

Qwen2.5-VL-7B-Instruct

Library Mode

Streamlit GUI

Playground

ToDos

Clean Docker/Podman

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages