Tags · ollama/ollama

v0.9.2

Revert "ggml: Export GPU UUIDs" (#11115)

This reverts commit aaa7818.

Jun 18, 2025
ed567ef
zip
tar.gz
Notes
Downloads

v0.9.1

readme: add GPTranslate to community integrations (#11071)

Jun 14, 2025
5a8eb0e
zip
tar.gz
Notes
Downloads

v0.9.1-rc1

tools: loosen tool parsing to allow for more formats (#11030)

Jun 12, 2025
9f8a18e
zip
tar.gz

v0.9.1-rc0

spawn desktop quickly (#11011)

Give the desktop app a hint to start fast.

Jun 8, 2025
feeabda
zip
tar.gz

v0.9.0

add thinking support to the api and cli (#10584)

- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models

May 29, 2025
5f57b0e
zip
tar.gz
Notes
Downloads

v0.9.0-rc0

add thinking support to the api and cli (#10584)

- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models

May 29, 2025
5f57b0e
zip
tar.gz

v0.8.0

client: add request signing to the client (#10881)

If OLLAMA_AUTH is set, sign each request w/ a timestamp and pass the signature in the token header

May 27, 2025
aa25aff
zip
tar.gz
Notes
Downloads

v0.8.0-rc0

tools: relax JSON parse constraints for tool calling (#10872)

May 27, 2025
066d0f4
zip
tar.gz

v0.7.1

llama: add minimum memory for grammar (#10820)

May 23, 2025
884d260
zip
tar.gz
Notes
Downloads

v0.7.1-rc2

llama: add minimum memory for grammar (#10820)

May 23, 2025
884d260
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.9.2

v0.9.1

v0.9.1-rc1

v0.9.1-rc0

v0.9.0

v0.9.0-rc0

v0.8.0

v0.8.0-rc0

v0.7.1

v0.7.1-rc2

Tags: ollama/ollama