-
Carnegie Mellon University
- Pittsburgh
- anuragxel.github.io
- @anuragxel
Stars
PLUTO: Push the Limit of Imitation Learning-based Planning for Autonomous Driving
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
pySLAM is a Python-based Visual SLAM pipeline that supports monocular, stereo, and RGB-D cameras. It offers a wide range of modern local and global features, multiple loop-closing strategies, a vol…
A fast and simple structure from motion pipeline written in Pytorch.
Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
[CVPR 2025] AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
Infinite Photorealistic Worlds using Procedural Generation
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Official implementation of Inductive Moment Matching
An image retrieval model for any localization task
1 million FPS multi-agent driving simulator
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
Universal Monocular Metric Depth Estimation
Instance-Level Image Warping for Domain Adaptation
Inverse Painting: Reconstructing The Painting Process (SIGGRAPH ASIA 2024)
[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
A framework to easily use 32 (and growing) different image matching methods
run DROID-SLAM with Metric3D to improve monocular performance
A 3DGS framework for omni urban scene reconstruction and simulation.
Uses Unreal Engine & Cesium to generate large synthetic dataset from 3D meshes. Enables machine learning tasks like Visual Place Recognition read more in our paper on this: https://meshvpr.github.io
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Algorithmically create or extend categorical colour palettes
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Ongoing research training gaussian splatting at scale by distributed system