Bot is using ollama llm container with llama3.2 under the hood. For keeping track of conversation state and workflows Temporal is used.
Ollama is a customizable llm container. You can personalize it in ollama/Modelfile file.
On first start ollama container will be pulling llama model which is 2GB and it can take some time.
make build
make up
Open a websocket connection to the server with clientID as a parameter
ws://localhost:3000/ws?clientID=johnny
Send a mesage
{"event":"prompt_request","data":"hi"}
Check chat history
http://localhost:3000/history?clientID=johnny
Chat workflow is designed so that it's initiated once the client sends first message, it receives new chat messages as an workflow update. The chat message is put for processing for llm in the backgroud. Once llm activity is finished - workflow notifies (sends a callback) api server with a response from llm. Api server pings connnected client websocket with a result.