-
Keio University
- Yokohama, Japan
- https://chakio.github.io/Portfolio/
- https://orcid.org/0000-0002-3076-3258
Highlights
- Pro
Stars
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
MambaGlue: Fast and Robust Local Feature Matching With Mamba @ ICRA'25
OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine. Join our Discord for support: https://discord.gg/bccR5vGFEx
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting
SpatialLM: Large Language Model for Spatial Understanding
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Programming samples in Python and C++ for the tiscamera GStreamer modules.
3D visualization library for rapid prototyping of 3D algorithms
Python library to control the Elgato Stream Deck.
USB camera driver based on ros_control
Encoding/decoding image transport for ROS. Only supports decoding H.264 for now.
Faster Whisper transcription with CTranslate2
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model