I have implemented a simple GPT(decoder only transformer) model using PyTorch, and trained it on a toy dataset of GITA. The model is trained to predict the next word in a sentence given the previous words.
toygpt.py
: Contains the implementation of the GPT model and training loopgita.txt
: Contains the toy dataset of GITAtoygpt_nb.ipynb
: Jupyter notebook containing the code to train and evaluate the model(as I have trained the model on google colab(i am gpu poor lol))toygpt_gita_output.txt
: Contains the generated text by the model
- I have trained a decoder only transformer for this task.
- The model is trained on a toy dataset of GITA
- The model is trained for 10000 epochs with a batch size of 16 and block size of 32
- The model is able to generate coherent text that resembles the training data, but is not able to generate meaningful text beyond the training data.
- It is a toy model and is not trained on a large dataset.
- The model can be further improved by training on a larger dataset and using a more sophisticated language modeling objective.
here is the image that shows the input(which the model is trained on) and output(the text generated from the model)