8000 PetrosKataras (Petros Kataras) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View PetrosKataras's full-sized avatar

Block or report PetrosKataras

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning

Python 37 2 Updated Jun 19, 2025

Code for the Molmo Vision-Language Model

Python 543 39 Updated Dec 12, 2024

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 2,391 258 Updated May 26, 2025

[CVPR 2025] "A Distractor-Aware Memory for Visual Object Tracking with SAM2"

Python 340 25 Updated Jun 26, 2025

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,378 77 Updated May 28, 2025

[CVPR25] Official repository for the paper: "SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation"

Python 279 13 Updated Jun 28, 2025

UFM: A Unified Dense Image Correspondence Estimator for both Optical Flow & Wide Baseline Matching Tasks. Matches any pair of images.

Python 181 3 Updated Jun 14, 2025
Python 40 2 Updated Jun 10, 2025

About This repository is a curated collection of the most exciting and influential CVPR 2025 papers. 🔥 [Paper + Code + Demo]

Python 650 35 Updated Jun 16, 2025

an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM

Rust 15,467 1,284 Updated Jul 4, 2025

✨ An advanced 3D Gaussian Splatting renderer for THREE.js

TypeScript 744 40 Updated Jul 4, 2025

[CVPR 2025] Official PyTorch implementation of "EdgeTAM: On-Device Track Anything Model"

Jupyter Notebook 469 32 Updated Apr 30, 2025

Official repository for "Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment"

Python 683 67 Updated Jun 2, 2025

The official code implementation of the paper "OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data."

Python 364 24 Updated Jun 8, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,287 50 Updated Jun 14, 2025

[SIGGRAPH 2025] PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

Python 329 12 Updated May 13, 2025

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Python 280 17 Updated May 15, 2025

Removes reflections quickly and easily.

Python 24 1 Updated Feb 10, 2024

A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms

Python 1,785 153 Updated Jul 2, 2025

[CVPR 2025] UniK3D: Universal Camera Monocular 3D Estimation

Python 543 33 Updated Jun 11, 2025

YOLOE: Real-Time Seeing Anything [ICCV 2025]

Python 1,400 125 Updated Jun 26, 2025

Big & Small LLMs working together

Python 1,046 116 Updated Jul 4, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,251 253 Updated Jun 12, 2025

RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.

Python 2,310 254 Updated Jul 3, 2025

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 9,405 898 Updated Jul 4, 2025

Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.

Python 8,805 464 Updated Jun 30, 2025

The official Python SDK for Model Context Protocol servers and clients

Python 15,626 1,967 Updated Jul 4, 2025

🚀 The fast, Pythonic way to build MCP servers and clients

Python 13,953 854 Updated Jul 4, 2025

Official examples and tools from the JACK project

C 44 17 Updated Jul 7, 2024

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,744 131 Updated May 30, 2025
Next
0