Tags: benbrandt/text-splitter
Tags
feat!: special tokens encoded by default (#512) * feat!: special tokens encoded by default Special tokens are now also encoded by both Huggingface and Tiktoken tokenizers. This is closer to the default behavior on the Python side, and should make sure if a model adds tokens at the beginning or end of a sequence, these are accounted for as well. * test: fix python tests * docs: clarify which tokenizers are affected
PreviousNext