So i got extremely bored over the holidays and decided to just make a fun project to see if its possible to automate the kind of content i was seeing on TikTok
To facilitate and make things easier to create videos, I have created a web interface.
Image Overlays:
To add your own image overlays, follow these steps:
-
Create a new folder inside the
assets/
directory. Choose a descriptive name for your folder (e.g.,my_overlays
). -
Place your image files into this new folder. For optimal display, aim for images around 512 x 512 pixels in size. Common image formats like PNG and JPG are supported.
-
Open the
main.py
file. -
Locate the
asset_name
variable. -
Replace the existing value of
asset_name
with the name of the folder you created in step 1 (e.g., if your folder ismy_overlays
, setasset_name = "my_overlays"
).
Audio Samples:
To use custom audio samples:
-
Place your audio files directly into the
assets/
directory. Common audio formats like MP3 and WAV are likely supported. -
Open the
main.py
file. -
Locate the relevant audio configuration (the specific variable name might vary depending on your code).
-
Update the audio configuration to point to your audio file name.
Pre-loaded Assets:
For your convenience, four "brain rot" character images are already included in the assets/
directory. You can use these directly by setting the asset_name
in main.py
to one of the following (case-sensitive):
trump
spongebob
lebron
griffin
Deciding if its a thread or an actual link based on the link provided, if it is a thread it will go through a filter via VADER and llama 3.3 70b via sentiment analysis to see which thread to select, else it would enter directly into scraping.
Simple webscraping using Reddit's open source API, to collect the title and story based off the reddit website
Using Coqui's xTTSv2 (which is super lightweight , portable and accurate), I converted the text into audio. Coqui's TTS audio also allows you to use sample audios, so I used the common-man's TikTok audio.
Removal of certain punctuations, special characters via RegEx before we carry out force alignment.
The most important step to generate the video was the alignment between audio and text in order to get the subtitle. This was achieved using forced alignment. In this, we used wav2vec2 and base it all on Motu Hira's tutorial on Forced alignment with Wav2Vec2. It uses a frame-wise label probality from the audio (that is the voice that we generated), creates a trellis matrix representing the probability of labels aligned per time step before using the most likely path from the trellis matrix.
Once we got the audio, video sample as well as the timestamp text (which is in .ass format btw), we can then generate the video using some simple ffmpeg magic.
This is a new algorithm that basically imposes images on the video feed. So for every sentence that was spoken, it would then feed another image. The algorithm used was suprisingly more complex as it looks, involving aligning with time stamp and knowing when to switch between each new sentence structure.
video_generator.py
Option 1: Local Installation
All are important scripts within this, each deliberately separated so that it can be easier to include any upgrades in the future (and what not)
Afterwards, just run the script
python server.py
and you are good to go!
Take note, when turning on the reddit link thread,
it will prompt for Groq API Key which can be acquired when you sign up as a Groq member. I will look into integrated other LLMs or creating your own heuristics too in the future :).
Option 2: Docker Installation (Recommended)
This project now includes a Dockerfile for easier setup.
- Docker installed on your system.
-
Clone the Repository:
git clone https://github.com/harvestingmoon/OBrainRot.git cd OBrainRot
-
Build the Docker Image:
docker build -t obrainrot:latest .
-
Run the Docker Container:
docker run -it -p 8000:5000 obrainrot:latest /bin/bash
This command will:
- Run a container in interactive mode (
-it
). - Map port 8000 on your host machine to port 5000 in the container (
-p 8000:5000
). - Start a bash shell inside the container.
- Run a container in interactive mode (
-
Edit the Reddit Link: Inside the container's bash shell, navigate to the project directory and edit the Reddit link in
main.py
using a text editor likenano
orvim
. -
Run the Script:
python3 main.py
Note: Currently,
server.py
is not functioning correctly within the Docker environment.
So far , yes there are but it is to hopefully create website or gradio to make this more user friendly, and to hopefully create more brain rot videos in the future (i am looking at OpenSora but no plans as of now)
I would like to thank Motu Hira for creating this tutorial on Forced Alignment using Wav2Vec2. Without this, the subtitles would not be able to work (the original plan was to use CMUSphinx but the lack of community support made it difficult for me to work with)
Here is the original Tutorial if anyone is interested: