-
Singapore University of Technology and Design
- Singapore
-
14:06
(UTC +08:00) - https://weiyan-shi.github.io
- in/shiweiyan
Lists (1)
Sort Name ascending (A-Z)
Stars
OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from simple icons to in…
Virtual whiteboard for sketching hand-drawn like diagrams
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A simple and elegant Jekyll theme for an academic personal homepage
Documentation on how to access and use the Quick, Draw! Dataset.
Magenta: Music and Art Generation with Machine Intelligence
About PyTorch implementation of DiffSketching: Sketch Control Image Synthesis with Diffusion Models, BMVC 2022
Displays the China Computer Federation (CCF) recommended rank of international conferences and journals in the dblp, Google Scholar, Connected Papers and and Web of Science search results.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
[IEEE SPL] End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Acceptance rates for the major AI conferences
A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization" (IEEE ISM 2021)
Graph learning framework for long-term video understanding
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
AAAI 2018 - Unsupervised video summarization with deep reinforcement learning (Theano)
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Evaluation code for Dense-Captioning Events in Videos
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…