-
Catalyx Space
- luqmanzaceria.github.io
- in/luqman-zaceria
Highlights
- Pro
Stars
Reproducing Yann LeCun 1989 paper "Backpropagation Applied to Handwritten Zip Code Recognition", to my knowledge the earliest real-world application of a neural net trained with backpropagation.
A list of awesome and diverse datasets related to space vehicle engineering for industry and research.
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Official implementation of "Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound"
A template repository for creating a waitlist using Next.js 14, Notion as a CMS, Upstash Redis for rate limiting and Resend for sending emails with a custom domain.
🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
A modern, highly customizable, responsive Jekyll template for course websites.
Youtube Subtitle Downloader downloads subtitles from Youtube videos (if those are present) and convert them to srt format.
Official Pytorch Implementation of SMIRK: 3D Facial Expressions through Analysis-by-Neural-Synthesis (CVPR 2024)
All lecture notes and assignments for CS231n: Convolutional Neural Networks for Visual Recognition class by Stanford
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
DelinQu / SimplerEnv-OpenVLA
Forked from simpler-env/SimplerEnvEvaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
A tab for sd-webui for replacing objects in pictures or videos using detection prompt
[CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility.
Inpaint anything using Segment Anything and inpainting models.
Official codebase for "Any-point Trajectory Modeling for Policy Learning"
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
A generative speech model for daily dialogue.
Online react video editor using remotion. Capcut and canva clone.