8000 GitHub - shivamCode0/mini-gpt: A collection of small language models trained using PyTorch
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

shivamCode0/mini-gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mini Language Models

A significant portion of the code is from the tutorial Let's build GPT: from scratch, in code, spelled out. by Andrej Karpathy.

How to Run

Just run the notebook corresponding to the model you want to run. The data should be automatically downloaded and extracted. You will need PyTorch and other common ML libraries already installed.

Models

Bigram Language Model (bigram.ipynb)

This is a model that predicts the next token based on the previous token. It uses an embedding with the vocab size as the embedding dim, making it a square. To generate the next token, it gets the embedding and calls softmax on it, essentially using the embedding as the logits. This works because it each character is a token, so with only 65 tokens there is $65^2$ parameters. This is a very simple model, but it is a good baseline for more complex models. Also, the performance is not that bad for the parameter count. Currently, the model is trained on the Tiny Shakespeare dataset, but it can be trained on any text file.

Sample Output

Sample output with input "LUCENT", temperature $1.0$

LUCENTER: und howiste ty dyotrd,
Theal lerno, y va f m my mulde ben s, r bet!
AMAs sod ke alved.
Thup sthe

GPT (not implemented yet)

This model is not implemented yet.

About

A collection of small language models trained using PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

< 2F8D a href="/users/shivamCode0/packages?repo_name=mini-gpt" data-view-component="true" class="Link--primary no-underline Link d-flex flex-items-center">Packages

No packages published
0