8000 alanMachineLeraning (abcalan) / Starred · GitHub

More Web Proxy on the site http://driver.im/

alanMachineLeraning

Follow

🤒

Out sick

abcalan alanMachineLeraning

🤒

Out sick

Follow

3 followers · 4 following

哈哈哈哈
china

Lists (32)

Sort

Ai消除

caption竞赛

grounding

15 repositories

llm 综述

传统nlp理解

传统OCR

其他

图像编辑

多模态大模型

12 repositories

多模态推理思考模型

多模态理解+图像编辑

多模态视频理解

大模型微调

大语言模型

大语言模型思维链

好玩的应用

强化学习

推理加速框架

文字inpainting

文搜视频

文生图

模型层可视化

海报多层生成

溯源码二维码

神经网络PPT

纯视觉分割检测识别

表征

视频处理工具

视频时间定位

训练框架

量化

音频工具

Stars

1rgs / jsonformer

A Bulletproof Way to Generate Structured JSON from Language Models

Jupyter Notebook 4,756 180 Updated Feb 24, 2024

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 4,360 364 Updated Jun 17, 2025

stepfun-ai / Step1X-Edit

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 1,458 62 Updated Jun 26, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 15,511 2,201 Updated Jun 27, 2025

si0wang / ThinkLite-VL

Python 84 6 Updated Jun 10, 2025

baaivision / Emu3

Next-Token Prediction is All You Need

Python 2,156 81 Updated Mar 17, 2025

RedAIGC / Flux-version-LayerDiffuse

Python 187 11 Updated May 9, 2025

joanrod / star-vector

StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and te…

Python 3,925 207 Updated Apr 15, 2025

microsoft / art-msra

[CVPR 2025] Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Jupyter Notebook 314 36 Updated Jun 17, 2025

SkyworkAI / Skywork-R1V

Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning

Python 2,645 251 Updated Jun 10, 2025

erwold / qwen2vl-flux

Python 534 30 Updated Nov 26, 2024

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 10,056 1,655 Updated Jun 27, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 8,327 717 Updated Jun 27, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 53,071 6,499 Updated Jun 27, 2025

mistralai / mistral-finetune

Python 2,973 277 Updated Sep 13, 2024

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 964 133 Updated Apr 12, 2024

xuguohai / X-CLIP

An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"

Python 163 17 Updated Apr 6, 2024

adxcreative / EERCF

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

Python 17 1 Updated Feb 19, 2025

huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Python 280 12 Updated Jun 13, 2024

RenShuhuai-Andy / TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 381 34 Updated May 8, 2025

hlchen23 / ADPN-MM

Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding"

Python 50 2 Updated Dec 30, 2023

AI-Application-and-Integration-Lab / SAM4MLLM

[ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation,

Jupyter Notebook 30 3 Updated Mar 20, 2025

rkzheng99 / ViLLa

Video Reasoning Segmentation

21 Updated Nov 29, 2024

zamling / PSALM

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Python 240 12 Updated Dec 30, 2024

dvlab-research / LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,270 162 Updated Feb 16, 2025

FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 569 44 Updated Jun 7, 2024

lxtGH / OMG-Seg

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,303 50 Updated May 30, 2025

congvvc / HyperSeg

Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".

Python 153 3 Updated Dec 13, 2024

SkyworkAI / Vitron

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 550 35 Updated Oct 20, 2024

congvvc / InstructSeg

Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"

Python 41 2 Updated Feb 10, 2025

0