GitHub - genlm/genlm-bytes: Algorithms for byte-level language modelling

GenLM Bytes is a Python library for byte-level language modeling. It contains algorithms for turning token-level language models into byte-level language models.

See the docs for details and basic usage.

Note: This project is under active development — expect bugs, missing features, and breaking changes. Please report any issues or suggestions in the issue tracker.

Quick Start

This library requires python>=3.11 and can be installed using pip:

pip install genlm-bytes

For faster and less error-prone installs, consider using uv:

uv pip install genlm-bytes

See DEVELOPING.md for details on how to install the project for development.

Usage

from genlm.bytes import ByteBeamState, BeamParams
from genlm.backend import load_model_by_name

# Load a token-level language model from a huggingface model name
# (Note: for fast GPU inference, specify `backend="vllm"`)
llm = load_model_by_name("gpt2-medium")

# Initialize a beam state with a maximum beam width of 5 and a prune threshold of 0.05 (higher threshold values lead to more aggressive pruning).
beam = await ByteBeamState.initial(llm, BeamParams(K=5, prune_threshold=0.05))

# Populate the beam state with byte context.
beam = await beam.prefill(b"An apple a day keeps the ")

# Get the log probability distribution over the next byte.
logp_next = await beam.logp_next()
logp_next.pretty().top(5)
# Example output:
# b'd' -0.5766762743944795
# b'b' -2.8732729803080233
# b's' -2.9816068063730867
# b'w' -3.3758250127787264
# b'm' -3.528177345847574

# Prune the beam and extend it with a new byte
new_beam = await (beam.prune() << 100) # 100 is the byte value of 'd'

See basic usage for a more detailed example.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
genlm/bytes		genlm/bytes
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
DEVELOPING.md		DEVELOPING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quick Start

Usage

About

Uh oh!

Releases 1

Packages

Languages

License

genlm/genlm-bytes

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages