Starred repositories
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
Python tool for converting files and office documents to Markdown.
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Code for paper "MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning"
No fortress, purely open ground. OpenManus is Coming.
A simple screen parsing tool towards pure vision based GUI agent
This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.
Toolkit for linearizing PDFs for LLM datasets/training
The plan which extend ChatHaruhi into Zero-shot Roleplaying model
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
OCR, layout analysis, reading order, table recognition in 90+ languages
Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
A realtime serving engine for Data-Intensive Generative AI Applications
Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM
A High-Quality Real Time Upscaler for Anime Video
Official Code for ICCV 2021 paper "Towards Flexible Blind JPEG Artifacts Removal (FBCNN)"
More practical frame interpolation approach.
APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫