Stars
本项目包含一个 Python 脚本,用于分离双人(或多人)对话播客音频文件中的不同说话人语音。它利用 `pyannote.audio` 库进行说话人日志分析(Speaker Diarization),找出“谁在什么时候说话”,并将每个说话人的语音片段提取到单独的音轨中。
🚀 The fast, Pythonic way to build MCP servers and clients
A powerful tool for creating fine-tuning datasets for LLM
DeepEP: an efficient expert-parallel communication library
Janus-Series: Unified Multimodal Understanding and Generation Models
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
Vanilla JS web interface for Gemini 2.0 flash-exp Multimodal API with text, audio, camera, screen inputs and audio responses and function calling
🤯 Lobe Chat - an open-source, modern design AI chat framework. Supports multiple AI providers (OpenAI / Claude 4 / Gemini / DeepSeek / Ollama / Qwen), Knowledge Base (file upload / RAG ), one click…
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.