Stars
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools,…
[NAACL'25] TEaR framework for paper "TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement"
translate-pptx is a command line tool that translates PowerPoint PPTX files from one language to another.
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
A video translation and dubbing tool powered by LLMs, offering professional-grade translations and one-click full-process deployment. It can generate content optimized for platforms like YouTube,T…
Open-source and strong foundation image recognition models.
万物检测(零样本检测+识别) demo for SG2300X 【Recognize Anything + GroundingDINO】
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Effortless data labeling with AI support from Segment Anything and other awesome models.
Code for paper "MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning"
A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2, and Kokoro-82M.
获取微信信息;读取数据库,本地查看聊天记录并导出为csv、html等格式用于AI训练,自动回复等。支持多账户信息获取,支持所有微信版本。
使用Electron(vue3+ts)和python开发的具有现代化UI和友好交互的微信聊天数据智能管理分析工具
This is an Electron quick-start template built on Vite, Vue 3, and UnoCSS.
CSerialPort - lightweight cross-platform serial port library for C++/C/C#/Java/Python/Node.js/Electron/Rust
Talk To AI with FastRTC enables natural, real-time voice conversations with AI using WebRTC, offering customizable voices, interfaces, and local or cloud-based API integration.
The python library for real-time communication
Project Page repo of OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
LiberSonora,寓意“自由的声音”,是一个 AI 赋能的、强大的、开源有声书工具集,包含智能字幕提取、AI标题生成、多语言翻译等功能,支持 GPU 加速、批量离线处理。LiberSonora, meaning "The Voice of Freedom," is an AI-powered robust open-source audiobook toolkit.
Next-Gen AI Translation Tool Powered by LLM. Support Office documents, PDF, TXT, and more format with just one click.
A text extraction library supporting PDFs, images, office documents and more
A nearly-live implementation of OpenAI's Whisper.
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal is…
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds