-
05:43
(UTC +08:00) - https://nctimtang.github.io/tangxi.github.io/
Stars
Implementation of paper "CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis"
Fully open reproduction of DeepSeek-R1
Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".
A One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
LAVIS - A One-stop Library for Language-Vision Intelligence
[ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization
[ECCV 2024] ControlCap: Controllable Region-level Captioning
[CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
YOLOv12: Attention-Centric Real-Time Object Detectors
CVPR2024, Semantic-aware SAM for Point-Prompted Instance Segmentation
The official python toolkit for running experiments and evaluate performance on VideoCube benchmark @TPAMI2023
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
Solve puzzles. Improve your pytorch.
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
EVA Series: Visual Representation Fantasies from BAAI
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivation. It is probably the code which is the most close to select…
Deep and online learning with spiking neural networks in Python
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues