Stars
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
An open-source AI agent that brings the power of Gemini directly into your terminal.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
A powerful tool for creating fine-tuning datasets for LLM
Effortless data labeling with AI support from Segment Anything and other awesome models.
🚀 One-stop solution for creating your digital avatar from chat history 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. …
Model Context Protocol Servers
Arduino library to play MOD, WAV, FLAC, MIDI, RTTTL, MP3, and AAC files on I2S DACs or with a software emulated delta-sigma DAC on the ESP8266 and ESP32 and Pico
RooCodeInc / Roo-Code
Forked from cline/clineRoo Code (prev. Roo Cline) gives you a whole dev team of AI agents in your code editor.
Hi all, this a flight controller code for ESP32 written on ArduinoIDE, there are test files for each component, follow schematic.
转换网易云音乐 ncm 到 mp3 / flac. Convert Netease Cloud Music ncm files to mp3/flac files.
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal is…
百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断
本项目为xiaozhi-esp32提供后端服务,帮助您快速搭建ESP32设备控制服务器。Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
利用AI大模型,一键解说并剪辑视频; Using AI models to automatically provide commentary and edit videos with a single click.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Official inference framework for 1-bit LLMs
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks