8000 GitHub - zlab-foss/shirin_sokhan-core: A Persian Poet Transformer! (finetuned GPT2 on Ganjoor data)
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

A Persian Poet Transformer! (finetuned GPT2 on Ganjoor data)

Notifications You must be signed in to change notification settings

zlab-foss/shirin_sokhan-core

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shirin-Sokhan

A small project to finetune a pretrained gpt2 on poem dataset.

Dataset

We crawled the awesome Ganjoor website. Then we cleaned the text to include a standard Persian character set only. Then we added poet name as a token at the beginning of each poem. Visit data preparation notebook for more detail on data preprocessing and data module for dataset implementation.

Model

We used the pretrained Persian GPT2 accessible from here. We used this model's tokenizer with added special tokens. Visit model module for more details of implementation using transformers package.

Training

We used pytorch lightning as backbone. Visit main notebook for more details on training and generation.

Demo

We used streamlit to create an app for demo. run streamlit run demo.py

Generated Sample

screenshot

References

[1] https://github.com/hooshva 4D1D re/parsgpt

About

A Persian Poet Transformer! (finetuned GPT2 on Ganjoor data)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.7%
  • Python 6.3%
0