This project was developed for the LlamaCon Hackathon 2025 in San Francisco. The primary goal was to explore and demonstrate how the multimodal image understanding capabilities of Llama 4 can be applied to a practical use case: assisting CCTV control room operators in identifying predefined video events without requiring any model fine-tuning.
- Integrate video streams (e.g., RTSP).
- Allow users to define specific events to track.
- Utilize Llama 4's image analysis to detect these events in video chunks.
- Generate real-time alerts for detected events.
- Store event data for reporting and analysis.
- Backend: Python (FastAPI), LLaMA 4
- Frontend: Next.js
The system processes video streams by chunking them, analyzing chunks for defined events using Llama 4, and storing detected events in a database.
flowchart LR
%% ── Core stages ──
subgraph Core stages
RTSP["RTSP Stream"]
Chunker["Video Stream Chunker (saves N-second files)"]
Detector["Video Event Detector (using Llama 4)"]
DBWriter["Database Writer"]
end
%% ── Supporting resources ──
subgraph Supporting resources
FS[(Filesystem)]
ChunkQueue["Chunk Queue (file paths)"]
EventQueue["Event Queue (event JSON)"]
DB[(Database)]
end
%% ── Data flow ──
RTSP --> Chunker
%% writes video files
Chunker --> FS
%% enqueues file paths
Chunker --> ChunkQueue
ChunkQueue --> Detector
%% enqueues detected events
Detector --> EventQueue
EventQueue --> DBWriter
%% persists events
DBWriter --> DB
Here's a walkthrough of the user interface:
-
Welcome Screen: The application greets the user and explains its purpose.
-
Stream Configuration: Users input the preview and video stream URLs.
-
Event Definition: Users define the events they want to monitor by providing an event code, description, and detection guidelines for Llama 4. Multiple events can be added.
-
Monitoring Dashboard: This view shows the live video feed (or preview), the list of defined events, and a log of detected events.
-
Event Detail: (Assuming step 5 shows event details or similar - update if incorrect) Shows specific details when an event is detected or selected.