A significant portion of the code is from the tutorial Let's build GPT: from scratch, in code, spelled out. by Andrej Karpathy.
Just run the notebook corresponding to the model you want to run. The data should be automatically downloaded and extracted. You will need PyTorch and other common ML libraries already installed.
Bigram Language Model (bigram.ipynb)
This is a model that predicts the next token based on the previous token. It uses an embedding with the vocab size as the embedding dim, making it a square. To generate the next token, it gets the embedding and calls softmax
on it, essentially using the embedding as the logits. This works because it each character is a token, so with only 65 tokens there is
Sample output with input "LUCENT"
, temperature
LUCENTER: und howiste ty dyotrd,
Theal lerno, y va f m my mulde ben s, r bet!
AMAs sod ke alved.
Thup sthe
This model is not implemented yet.