Overview

Given a set of files, this tool finds the top k according to any given criteria, using an LLM as the judge.

It's intended for small datasets and small k; e.g., finding the top 10 out of a few hundred files. Comparisons are done pairwise: the LLM is given two documents at a time and asked to pick the better one according to the specified criteria. A single-elimination tournament is used to determine the overall "best" file, with additional rounds to determine the runners-up. For a dataset of n files, the tool has to invoke the LLM approximately (n-1) + (k-1)*log_2(n) times. (This algorithm was chosen over some others that have better asymptotic runtime complexity, because of its greater efficiency when k and n are small.)

The supported model APIs are Ollama and Anthropic (Claude).

Current limitations / known issues:

Only text files are supported (no images/PDFs/etc)
Models that always output chain-of-thought, e.g. DeepSeek-R1, are not supported
Improperly formatted model output (which is especially likely to be an issue with small models) results in an unrecoverable error

Installation

You should have Python 3.13.2 or greater installed, then:

pip install rank-files

Usage

First, put all the files you want to rank into one folder. Basic usage of the tool looks like this:

rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10

However, you'll probably need extra setup based on which model and model provider you want to use, as described in the following sections.

Ollama

By default the tool will assume you have Ollama locally. You can use a remote Ollama instance by setting the OLLAMA_HOST environment variable to the appropriate URL.

You must have whatever model you want to use installed in Ollama ahead of time. By default the tool tries to use gemma3:4b, which you can install via ollama pull gemma3:4b. However, this model may not be powerful enough for use cases like the one in the example above. You can set the RANK_FILES_MODEL environment variable to use a different model, e.g.:

RANK_FILES_MODEL=llama3.3:70b rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10

Claude

Alternatively, you can use Claude by setting ANTHROPIC_API_KEY, RANK_FILES_PROVIDER, and RANK_FILES_MODEL. Remember, this costs money and the number of API invocations grows superlinearly; make sure you know what you're doing.

ANTHROPIC_API_KEY=... RANK_FILES_PROVIDER=anthropic RANK_FILES_MODEL='claude-3-5-haiku-latest' rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10

Caching

You will notice a file named rank-files-cache.sqlite3 created in the current directory when you run the tool. This stores hashes of prompts and the responses received for them, so that the tool won't ask the same model to compare the same two files twice.

This means that if the tool is interrupted, no important work is lost—you can rerun it again with the same parameters (the criteria must be exactly the same) and it will use the cached results for any comparisons that were already performed.

If you want the cache to go somewhere else, set the RANK_FILES_CACHE environment variable to the desired path and filename; or set it to :memory: if you don't want it at all.

Development

You need uv installed.

Run tests with uv run pytest.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src/rank_files		src/rank_files
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Installation

Usage

Ollama

Claude

Caching

Development

About

Uh oh!

Releases

Packages

Languages

License

brokensandals/rank_files

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Usage

Ollama

Claude

Caching

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages