Stars
官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS…
FlashMLA: Efficient MLA decoding kernels
Formula recognition based on LaTeX-OCR and ONNXRuntime.
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Effortless data labeling with AI support from Segment Anything and other awesome models.
A series of math-specific large language models of our Qwen2 series.
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
A generative speech model for daily dialogue.
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A Gradio web UI for Large Language Models with support for multiple inference backends.
Open-Sora: Democratizing Efficient Video Production for All
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Foundational Models for State-of-the-Art Speech and Text Translation
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Efficient vision foundation models for high-resolution generation and perception.
This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the …