PyLM aims to be a full-featured language model package for Python. It deals with language models, which are basically probability distributions of sequences of words.
Now available:
- Import n-gram models from ARPA files
- A simple language model querying script called
lm-query.py
Planned:
- Uniform interface for language models like n-grams (including back-off), topic-based language models, etc.
- Wrapper for other toolkits like RNNLM and word2vec.
The source code is currently hosted on GitHub at: https://github.com/bryandeng/PyLM
Just clone this repo:
git clone https://github.com/bryandeng/PyLM.git
For now you don't need to install it. Go into its directory and have a try.
You can play with lm-query.py
like this:
./lm-query.py lm.arpa < test.txt > test.probs 2> test.pp
lm-query.py
calculates the probabilities of words in different sentences in test.txt
and writes results in the same format as KenLM to stdout
, also outputs perplexities to stderr
. You can redirect outputs to files as shown above.
For now it only depends on the standard installation of Python 3.
GPLv3