tvtxt 📺✨

⚠️ Work in Progress - Technology Showcase
This is an experimental MVP demonstrating real-time AI capabilities. Not intended as a production-ready product.

Turn any live TV stream into a real-time movie script. AI watches, transcribes, and writes television as cinema.

Ever wondered what your favorite TV show would look like as a screenplay? tvtxt is an AI-powered pipeline that watches live television streams and transforms them into properly formatted movie scripts in real-time. Think of it as having a tireless scriptwriter that never blinks, never sleeps, and never misses a moment.

Live Demo

Watch the magic unfold in real-time: tvtxt live demo

Project Status

This is a proof-of-concept showcase built to demonstrate the integration of several cutting-edge technologies:

Real-time speech recognition.
Vision-language understanding.
Cloud-native AI inference.
Live streaming media processing.

What this is:

A technology demonstration.
An experimental MVP.
A learning playground for AI + media processing.

What this is NOT:

A production-ready application.
A commercial product.
A fully-featured streaming service.

The magic behind the curtain

tvtxt combines cutting-edge AI models with cloud infrastructure to create a TV-to-screenplay transformation:

Modal

Modal handles our cloud GPU infrastructure, running two critical AI workloads:

Parakeet ASR Model (NVIDIA) : Transcribes speech with remarkable accuracy and speed.
Qwen2-VL Vision-Language Model: Describes visual scenes with cinematic flair.

Outlines

Ensures our vision model outputs perfectly formatted JSON responses:

Schema enforcement: Guarantees consistent screenplay structure.

Others

Azure Blob Storage: Temporarily stores captured video frames for visual analysis.
Redis Cloud: Acts as the bridge between our backend pipeline and frontend display.
FastHTML: Creates our live web interface with authentic screenplay styling.
FFmpeg: The unsung hero that handles all media processing.

How the Magic Happens

🎥 Stream Capture: FFmpeg latches onto a live TV stream, extracting both audio and video.
🎧 Audio Analysis: Every 10 seconds, audio chunks are sent to Modal's Parakeet ASR model for transcription.
📸 Frame Extraction: When speech is detected, FFmpeg captures a corresponding video frame.
☁️ Image Upload: The frame is uploaded to Azure Blob Storage and gets a public URL.
👁️ Visual Understanding: Modal's Qwen2-VL model analyzes the image and generates a screenplay-formatted scene description.
💾 Memory Update: The latest transcription and scene description are saved to Redis Cloud.
🖥️ Live Display: FastHTML serves a web page that auto-refreshes, showing the generated screenplay.
🔄 Repeat: The cycle continues, creating an ever-updating script of live television.

Installation & Setup

1. Environment Setup

uv venv
source venv/bin/activate  # or `.venv\Scripts\activate` on Windows
uv pip install -r requirements.txt
modal token new

2. Configure Your Credentials

Create a .env file with your secret weapons:

# Azure Blob Storage (for frame storage)
AZURE_STORAGE_CONNECTION_STRING=your_azure_connection_string

# Redis Cloud (for state management)
REDIS_HOST=your_redis_host
REDIS_PORT=your_redis_port
REDIS_USERNAME=your_redis_username
REDIS_PASSWORD=your_redis_password

# HuggingFace (for model access)
HF_TOKEN=your_huggingface_token

# Modal endpoint (will be generated after deployment)
IMAGE_DESCRIBER_URL=your_modal_endpoint_url

3. Deploy the Vision AI

Launch your scene description model to the cloud:

modal deploy scene_describer.py

Note: Copy the generated endpoint URL to your .env file as IMAGE_DESCRIBER_URL

4. Start the Show

Fire up the transcription pipeline:

modal run ingest.py

5. Watch the Magic

Launch the web interface:

cd app
python main.py

Open your browser to http://localhost:5001 and watch as live TV transforms into screenplay format before your eyes!

Philosophy

tvtxt embraces ephemerality by design. Like live theater, each moment exists only in the present:

No databases: Only the current scene matters.
No history: Previous scripts vanish like morning mist.
No storage: Frames and audio exist only long enough to be processed.

Disclaimer

This project demonstrates real-time AI transcription and visual analysis using Al Jazeera English as a public live stream. No content is stored, archived, or redistributed. The system processes live broadcasts in real-time for educational and demonstration purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
app		app
data		data
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ingest.py		ingest.py
requirements.txt		requirements.txt
scene_describer.py		scene_describer.py
screenshot.PNG.jpg		screenshot.PNG.jpg
tvtxt.PNG		tvtxt.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tvtxt 📺✨

Live Demo

Project Status

The magic behind the curtain

Modal

Outlines

Others

How the Magic Happens

Installation & Setup

1. Environment Setup

2. Configure Your Credentials

3. Deploy the Vision AI

4. Start the Show

5. Watch the Magic

Philosophy

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

aastroza/tvtxt

Folders and files

Latest commit

History

Repository files navigation

tvtxt 📺✨

Live Demo

Project Status

The magic behind the curtain

Modal

Outlines

Others

How the Magic Happens

Installation & Setup

1. Environment Setup

2. Configure Your Credentials

3. Deploy the Vision AI

4. Start the Show

5. Watch the Magic

Philosophy

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages