This is the companion codebase to my short explorations around the Sesame model.
The introduction blog post is here: https://thomwolf.io/blog/speech-ai.html
Currently preparing the training dataset from a (4 hours) NotebookLM dataset: https://huggingface.co/datasets/thomwolf/notebooklm-sample
Current processing scripts are in this folder: ./scripts
[✅] Extract and convert audio
[✅] Diarize dataset
[ ] Audio and text tokenization
[ ] Finetuning CSM