Lists (1)
Sort Name ascending (A-Z)
Stars
✨✨VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Port of the Festival-lite (Flite TTS) speech-synthesis engine to Android
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Android Transition animations explanation with examples.
Image doodle for Android, with functions such as undo, zoom, move, text, image, etc. Also a powerful, customizable and extensible doodle framework & multi-function drawing board. Android图片涂鸦,具有撤消,缩…
Code to accompany "A Method for Animating Children's Drawings of the Human Figure"
Automate Creation of YouTube Shorts using MoviePy.
🔥 将 public-apis 项目翻译为中文版,并收集添加国内常用 API,欢迎大家点赞 🌟 和贡献一行好用的 API,让这个项目成为中文版的免费 API 大全。
There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable an…
Stay on top of trending topics on social media and the web with AI
🎧 Open source music client! Available for both desktop & mobile!
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics…
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
Moxin is a family of fully open-source and reproducible LLMs
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
Generic automation framework for acceptance testing and RPA
zero-shot voice conversion & singing voice conversion, with real-time support
[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Have a natural, spoken conversation with AI!
Docker build for FFmpeg on Ubuntu / Alpine / Centos / Scratch / nvidia / vaapi
CUDA integration for Python, plus shiny features