Starred repositories
Control your Mac with detailed mouse, keyboard, screen, and window management capabilities.
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
🌐 WebWalker [ACL2025] & WebDancer [Preprint]
Chrome Extensions Samples
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
智能闲鱼客服机器人系统:专为闲鱼平台打造的AI值守解决方案,实现闲鱼平台7×24小时自动化值守,支持多专家协同决策、智能议价和上下文感知对话。
一个基于 Python + FastAPI + Playwright + Camoufox 的代理服务器,兼容 OpenAI API 支持参数转发设置,将请求转发到 Google AI Studio 网页版的对话,并同样按照标准格式返回输出的工具。课余时间有限,周更
A TTS model capable of generating ultra-realistic dialogue in one pass.
Allow LLMs to control a browser with Browserbase and Stagehand
Real Time Speech Transcription with FastRTC ⚡️and Local Whisper 🤗
A video translation and dubbing tool powered by LLMs, offering professional-grade translations and one-click full-process deployment. It can generate content optimized for platforms like YouTube,T…
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Easily and securely send things from one computer to another 🐊 📦
直播带货工具,支持抖音小店、巨量百应、抖音团购、小红书千帆、视频号、快手小店平台,能自动弹窗,自动发言,AI助力回复
自动化上传视频到社交媒体:抖音、小红书、视频号、tiktok、youtube、bilibili
(2025最新版) 一个基于Netty的通用直播间弹幕客户端,支持网络代理,支持浏览器模式*,支持弹幕发送*、为主播点赞*,已支持B站、斗鱼、虎牙、抖音、快手;BarrageFly——让弹幕飞,基于该项目的一个弹幕转发、过滤、处理平台;支持多平台直播间弹幕监听(弹幕抓取 弹幕爬取 弹幕爬虫)
A ComfyUI custom node designed for advanced image background removal and object, face, clothes, and fashion segmentation, utilizing multiple models including RMBG-2.0, INSPYRENET, BEN, BEN2, BiRefN…
[CVPR 2025] Learning Flow Fields in Attention for Controllable Person Image Generation
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Toolkit for linearizing PDFs for LLM datasets/training