Interactive Studio Artificial Intelligence
Studio AI is a client/server application for an interactive studio Artificial Intelligence (AI), represented by a dynamically rendered avatar. The avatar receives its inputs via a microphone device connected to a Speech-to-Text engine, performs its reasoning process with a Text-to-Text (Chat) engine, and sends its outputs through a Text-to-Speech engine for driving an AI avatar whose audio and video streams are injected back into the studio production process. The result is an AI avatar the people on the studio stage can interact with in nearly real-time. This is intended for including an AI participant in a discussion or Q&A round.
The Speech-to-Text engine is based on the Deepgram cloud service, the Text-to-Text engine is based on the OpenAI ChatGPT cloud service, and the Text-to-Speech engine is based on the HeyGen Interactive Avatar cloud service. Currently Studio AI works at least for English and German languages.
NOTICE: As a consequence, to be able to use Studio AI you need API keys of those three cloud services.
The following are four screenshots for impressions on Studio AI. The first three screenshots show the settings dialogs of the CONTROL mode. The forth screenshot show the control dialog of the CONTROL mode (with german language examples). The firth screenshot shows the client in RENDER mode within OBS Studio.
Studio AI is written in TypeScript, consists of a central Node.js-based server component and a HTML5 Single-Page Application (SPA) as the client component. The client component, in turn, runs in two distinct modes: an interactive control mode and an autonomous avatar rendering mode. The clients are communicating with each other through their bi-directional WebSocket connections to the server.
The core of the application can be found in the following software components:
-
Under Windows/macOS/Linux install Node.js for the server run-time, Google Chrome for the client run-time (control mode) and either OBS Studio or vMix for the client run-time (renderer mode).
-
Create and use local working copy:
git clone https://github.com/rse/studio-ai && cd studio-ai
-
Provide API keys of required cloud services:
echo "STUDIOAI_DEEPGRAM_API_TOKEN=\"<token1 8BBD >\"" >.env
echo "STUDIOAI_OPENAI_API_TOKEN=\"<token2>\"" >>.env
echo "STUDIOAI_HEYGEN_API_TOKEN=\"<token3>\"" >>.env
-
Install all dependencies:
npm install --production
-
Run the production build-process once:
npm start build
-
Run the bare server component:
npm start server
-
Open the client component (control mode) in Google Chrome:
https://127.0.0.1:12345/ -
Use the client component (renderer mode) in OBS Studio or vMix browser sources:
https://127.0.0.1:12345/#/render
-
Under Windows/macOS/Linux install Node.js for the server run-time and Google Chrome for the client run-time (both control mode and renderer mode), plus Visual Studio Code with its TypeScript, ESLint and VueJS extensions.
-
Create and use local working copy:
git clone https://github.com/rse/studio-ai && cd studio-ai
-
Provide API keys of required cloud services:
echo "STUDIOAI_DEEPGRAM_API_TOKEN=\"<token1>\"" >.env
echo "STUDIOAI_OPENAI_API_TOKEN=\"<token2>\"" >>.env
echo "STUDIOAI_HEYGEN_API_TOKEN=\"<token3>\"" >>.env
-
Install all dependencies:
npm install
-
Run the development build-process once:
npm start build-dev
-
Run the development build-process and server component continuously:
npm start dev
-
Open the client component (control mode) in Google Chrome:
https://127.0.0.1:12345/ -
Open the client component (renderer mode) in Google Chrome:
https://127.0.0.1:12345/#/render
The Studio AI application was inspired by a prototype application from msg systems ag, which employees of its public sector division and AI cross-division initially crafted for controlling an AI avatar on the panel discussion at the conference Nordl@nder Digital in September 2024. This prototype application was based on an earlier version of the HeyGen Interactive Avatar Demo for their HeyGen Streaming API.
In October 2024 Dr. Ralf S. Engelschall, CTO of msg group, initially integrated this prototype application into his msg Filmstudio. Unfortunately, the implementation did not allow a seamless studio integration. As a result, he just took the ideas of the prototype application and then developed Studio AI from scratch in order allow a more robust integration into a studio production process.
Copyright © 2024 Dr. Ralf S. Engelschall
Licensed under GPL 3.0