Telegram Local LLM is an AI-powered, fully configurable, smart chat-bot designed to integrate seamlessly into Telegram messenger without any cloud dependencies.
The project was initiated out of enthusiasm to develop a single, fully local AI assistant that could operate without relying on cloud-based services, able to handle multi-user conversantions, perform background thinking, web search and other features.
In recent years, AI-powered chat assistants have become increasingly prevalent, but most solutions are controlled by companies that impose paywalls, availability limitations, and usability constraints. Cloud-based AI models are often unreliable due to server downtime, while company-provided interfaces lack flexibility, do not support multi-user interactions, and are frequently designed with artificial limitations to push users toward paid tiers. Additionally, some companies intentionally degrade model performance before releases for marketing purposes, creating an unnecessary cycle of artificial scarcity.
To address these issues, tg-local-llm
provides a fully independent, self-hosted AI assistant that runs entirely on local hardware, ensuring maximum privacy and eliminating reliance on external providers. Built with a lightweight and easily modifiable stack using Deno and TypeScript, it allows users to customize and extend functionality with minimal programming experience—or none at all. The bot integrates seamlessly into Telegram Messenger, making it a handy, real-life companion for everyday tasks. Unlike traditional AI UIs, this solution supports long, continuous conversations, multi-user interactions, and a modular toolset that allows for easy integration of additional capabilities.
By giving users full control over their AI assistant, tg-local-llm ensures unrestricted access, adaptability, and long-term reliability, making it an ideal choice for those who prioritize privacy, customization, and independence in their AI interactions.
- Supports Reasoning
- Supports Web Search (text and image)
- Supports Web Page Reading (essential part of Web Search)
- Supports multi-user conversations
- Supports images (depends on LLM capabilities)
- Responds before/along with tool usage
- Minimal censorship
- Human-like character
- Bullet-proof message structure handling
- Answers any text/caption messages in group chats when mentioned by name
- Supports long conversations via replies
- Supports quote replies
- Supports TL;DR, analysis, etc. requests by replies
- Ignores messages starting with
//
for hidden replies - Supports continuous typing (edits message with more text)
- Various preferences
- Deno (TypeScript)
- MongoDB
- OpenAI-compatible LLM server. LLama.cpp is highly recommended.
- SearXNG instance. This is optional in general, but essential for web search.
- Environment with a possibility to run a headless browser (via Puppeteer)
- System prompt is divided into multiple sections
- You: defines character, personality, and behavior.
- Online Chat: defines environment of the conversation.
- Tools: defines tool usage rules.
- Provided Tools: a list of tools available to the model.
- Messages Format: defines message structure and formatting rules.
- User Messages: defines specifics of user (only) messages.
- Your (assistant) Messages: defines specifics of model (only) messages.
- Social Rules: defines list of rules to introduce social boundary bias and reduce censorship.
- Tools
- Web Search: model can search text and images on the web. This is a powerful and I think essential tool for any general purpose AI. There is no way other than web search to find realtime up to date information.
- Get Text Contents: a supplement to web search, this tool allows the model to retrieve text contents from a specific URL. Required by web search to read urls after finding them, but can also be used to read specific URLs directly per user request.
- I use LLama.cpp to load LLM and run inference. Since llama.cpp server is OpenAI-compatible, you should be able to use tg-local-llm with any OpenAI-compatible API.
- I use grammY framework to handle Telegram API.
- I created basic controllers to handle incoming messages in groups, one for text/caption messages, and for per-chat preferences.
- I created a simple API service to work with LLM API. Model and context length are set by a server, the service manages output structure, streaming and parsing.
- At this point, with basic full-text match of a name in a message will allow model to respond. All messages are grouped by threads (a sequence of replies) and stored in a database for further context building. This essentially implements basic communication with a model and long conversations.
- Next, I had to introduce tools, mainly for Web search. Sending tools along the message text (and potentially other pieces of data in a single message) is only possible with a strictly defined response format. This should be handled in two steps: first, I describe all so-called sections (such as message, tool, etc.) in System Prompt, with examples. This provides knowledge to a model about the structure of the response. This could work well, but sometimes model can misuse sections (write custom sections, use wrong characters, nest sections, etc.) so I leverage Structured Outputs by writing a strict grammar for the response format. Given this, model will technically be unable to break the format.
- The tricky part, or "σ̌-solution". Just running it as-is, model will almost never respond in a proper format. This is because grammar contains something similar to
<message_start> [any_character] <message_end>
. Given that grammars are not lazy, when model will generate<message_end>
it will be treated as a part of[any_character]
, so it won't be required to stop. Given the confusion between grammar requirement and model thinking that it already finished, it will always produce an insane amount of semi-random text. The simple solution is to pick some barely used character, such asσ̌
, and use it as a wrapper for section tags. Then, I replace[any_character]
with[any_character except σ̌]
. This way, whenever the model is writingσ̌
it will be handled as a part of a required section tag since it can't belong to "any character" part. Later I changed it to≪
(much less than) and≫
(much greater than) to not to introduce another language in responses which can make model switch it for no reason. - Having implemented a reliable tool usage structure, I've built 2 tools:
search_web
andread_article
. First one uses locally running SearXNG to retrieve a list of relevant links givenquery
. As a response, model receives a bullet list ofsource_url
,title
. Second one uses headless browser to evaluatedocument.body.innerText
essentially extracting all text from the web page. Result is passed to a separate LLM call (summarizer) with a request to summarize contents and remove metadata, summary is then given to main (chat context aware) model to respond. The tool is usually used aftersearch_web
or when users ask to read a specific URL. Bonus point: to avoid (rather minimise) robot checks and rejections on websites, I add custom User-Agent and some headers - it works much better (seesrc/services/browser.ts
). - Given that, we have a few more possibilities. First, I added
category: text|image
to thesearch_web
tool. This allows models to search images. Additionalimage
section is used by model to provide a direct image URL which is then used by client. Also, I updated structure so that model will writetool_call
section before themessage
section. Meaning, model can now describe what is it doing with tools and client can show this to user before it gets tool response and actual response from the model. - Simple thinking (reasoning) can now be easily implemented by adding a compulsory
thoughts
section before themessage
section. To make it meaningful, model is required to includeUser Response
,Reasoning
, andNext Steps
sections within thoughts tag. - In addition, I add
tool_guide
section aftertool_response
with instructions on what to do with a specific tool response. For example, with text search, guide section will require model to select a source and useget_text_content
tool read it. For image search, guide will prohibit extracting text and will require to provide one of the images in the response. - Sometimes it might be handy to introduce some permanent long-term memory notes. So I added basic memory instructions and a command
/ai remember
to add a note. Next step on this one and a great improvement could be to implement a simplememory
tool so that model can read and write memory notes itself. - To provide a much nicer experience, I introduced per-chat preferences, such as NSFW roleplay, "extreme state" and message limit (context length) notes. These basically modify system prompt, adjusting model behavior. For instance, you can use
/ai extremely lazy
to make model behave like a lazy person: internally, it will inject an instruction to behave this way and disable some conflicting instructions. Given this, I recommend to generate system prompt every time rather than storing it as a message in the database.
- Use
deno task start
to run. - Use
sudo ./make-service.bash service_name service_description
to create SystemD Service for background running (runs only tg-local-llm). - Use
sudo ./make-service.bash service_name service_description llamacpp_home model_path
to create SystemD Service for background running (runs only llamacpp server).
- See
.env.example
for general adjustments - See
src/services/model
for API, grammar, message building and prompt - See
src/services/tools
for tools - See
src/services/formatting
for formatting and parsing - See
types/database.ts
for custom preferences
- Big thanks to all open-source LLM developers
- Thanks to Ollama developers
- Thanks to LLama.cpp developers
- Powered by ExposedCat Dev
The repository is licenced under GPL3.0