10000 GitHub - rse/studio-ai: Interactive Studio Artificial Intelligence
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

rse/studio-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Studio AI

Interactive Studio Artificial Intelligence

About

Studio AI is a client/server application for an interactive studio Artificial Intelligence (AI), represented by a dynamically rendered avatar. The avatar receives its inputs via a microphone device connected to a Speech-to-Text engine, performs its reasoning process with a Text-to-Text (Chat) engine, and sends its outputs through a Text-to-Speech engine for driving an AI avatar whose audio and video streams are injected back into the studio production process. The result is an AI avatar the people on the studio stage can interact with in nearly real-time. This is intended for including an AI participant in a discussion or Q&A round.

The Speech-to-Text engine is based on the Deepgram cloud service, the Text-to-Text engine is based on the OpenAI ChatGPT cloud service, and the Text-to-Speech engine is based on the HeyGen Interactive Avatar cloud service. Currently Studio AI works at least for English and German languages.

NOTICE: As a consequence, to be able to use Studio AI you need API keys of those three cloud services.

collage

Screenshots

The following are four screenshots for impressions on Studio AI. The first three screenshots show the settings dialogs of the CONTROL mode. The forth screenshot show the control dialog of the CONTROL mode (with german language examples). The firth screenshot shows the client in RENDER mode within OBS Studio.

screenshot-1 screenshot-2 screenshot-3 screenshot-4 screenshot-5

Architecture

Studio AI is written in TypeScript, consists of a central Node.js-based server component and a HTML5 Single-Page Application (SPA) as the client component. The client component, in turn, runs in two distinct modes: an interactive control mode and an autonomous avatar rendering mode. The clients are communicating with each other through their bi-directional WebSocket connections to the server.

architecture

The core of the application can be found in the following software components:

Usage (Production)

  • Under Windows/macOS/Linux install Node.js for the server run-time, Google Chrome for the client run-time (control mode) and either OBS Studio or vMix for the client run-time (renderer mode).

  • Create and use local working copy:
    git clone https://github.com/rse/studio-ai && cd studio-ai

  • Provide API keys of required cloud services:
    echo "STUDIOAI_DEEPGRAM_API_TOKEN=\"<token1 8BBD >\"" >.env
    echo "STUDIOAI_OPENAI_API_TOKEN=\"<token2>\"" >>.env
    echo "STUDIOAI_HEYGEN_API_TOKEN=\"<token3>\"" >>.env

  • Install all dependencies:
    npm install --production

  • Run the production build-process once:
    npm start build

  • Run the bare server component:
    npm start server

  • Open the client component (control mode) in Google Chrome:
    https://127.0.0.1:12345/

  • Use the client component (renderer mode) in OBS Studio or vMix browser sources:
    https://127.0.0.1:12345/#/render

Usage (Development)

  • Under Windows/macOS/Linux install Node.js for the server run-time and Google Chrome for the client run-time (both control mode and renderer mode), plus Visual Studio Code with its TypeScript, ESLint and VueJS extensions.

  • Create and use local working copy:
    git clone https://github.com/rse/studio-ai && cd studio-ai

  • Provide API keys of required cloud services:
    echo "STUDIOAI_DEEPGRAM_API_TOKEN=\"<token1>\"" >.env
    echo "STUDIOAI_OPENAI_API_TOKEN=\"<token2>\"" >>.env
    echo "STUDIOAI_HEYGEN_API_TOKEN=\"<token3>\"" >>.env

  • Install all dependencies:
    npm install

  • Run the development build-process once:
    npm start build-dev

  • Run the development build-process and server component continuously:
    npm start dev

  • Open the client component (control mode) in Google Chrome:
    https://127.0.0.1:12345/

  • Open the client component (renderer mode) in Google Chrome:
    https://127.0.0.1:12345/#/render

History

The Studio AI application was inspired by a prototype application from msg systems ag, which employees of its public sector division and AI cross-division initially crafted for controlling an AI avatar on the panel discussion at the conference Nordl@nder Digital in September 2024. This prototype application was based on an earlier version of the HeyGen Interactive Avatar Demo for their HeyGen Streaming API.

In October 2024 Dr. Ralf S. Engelschall, CTO of msg group, initially integrated this prototype application into his msg Filmstudio. Unfortunately, the implementation did not allow a seamless studio integration. As a result, he just took the ideas of the prototype application and then developed Studio AI from scratch in order allow a more robust integration into a studio production process.

See Also

Copyright & License

Copyright © 2024 Dr. Ralf S. Engelschall
Licensed under GPL 3.0

About

Interactive Studio Artificial Intelligence

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  
0