10000 Tags · benbrandt/text-splitter · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: benbrandt/text-splitter

Tags

Verified

This commit was signed with the committer’s verified signature.
benbrandt Ben Brandt

v0.26.0

Toggle v0.26.0's commit message
Bump check in ci

v0.25.1

Toggle v0.25.1's commit message
chore: attempt to lower requirement on memchr

v0.25.0

Toggle v0.25.0's commit message
ci: remove caches

v0.24.2

Toggle v0.24.2's commit message

Verified

This commit was signed with the committer’s verified signature.
benbrandt Ben Brandt
prep v0.24.2

v0.24.1

Toggle v0.24.1's commit message
chore: update deps

v0.24.0

Toggle v0.24.0's commit message

Verified

This commit was signed with the committer’s verified signature.
benbrandt Ben Brandt
deps update

v0.23.0

Toggle v0.23.0's commit message

Verified

This commit was signed with the committer’s verified signature.
benbrandt Ben Brandt
chore: prep v0.23

Verified

This commit was signed with the committer’s verified signature.
benbrandt Ben Brandt

v0.21.0

Toggle v0.21.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat!: special tokens encoded by default (#512)

* feat!: special tokens encoded by default

Special tokens are now also encoded by both Huggingface and Tiktoken tokenizers. This is closer to the default behavior on the Python side, and should make sure if a model adds tokens at the beginning or end of a sequence, these are accounted for as well.

* test: fix python tests

* docs: clarify which tokenizers are affected
0