8000 entn-at (Ewald Enzinger) / Starred · GitHub

More Web Proxy on the site http://driver.im/

entn-at

Follow

Ewald Enzinger entn-at

Follow

Ph.D. EE (UNSW Sydney). ML, speaker recognition, speech recognition, speech synthesis, forensic voice comparison

112 followers · 310 following

Achievements

Achievements

Lists (1)

Sort

✨ Inspiration

Starred repositories

apple / ml-fastvlm

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 3,747 186 Updated May 5, 2025

F3Set / F3Set

Python 6 1 Updated Oct 2, 2024

netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.

Go 13,686 649 Updated May 22, 2025

bachhavpramod / bandwidth_extension

MATLAB 55 21 Updated Jul 5, 2022

mjpieters / aiolimiter

An efficient implementation of a rate limiter for asyncio.

Python 628 30 Updated Mar 31, 2025

Respaired / RiFornet_Vocoder

a Neural Vocoder supporting Ring Attention, Conformer and NSF.

Python 18 2 Updated Feb 14, 2025

kyutai-labs / moshi-finetune

Python 220 13 Updated Apr 3, 2025

davda54 / sam

SAM: Sharpness-Aware Minimization (PyTorch)

Python 1,874 205 Updated Feb 21, 2024

pacscilab / voxcommunis

HTML 8 1 Updated Mar 31, 2025

exercise-book-yq / FreeCodec

FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS

20 Updated Sep 9, 2024

ShoukanLabs / VoPho

A collection of all our phonemeizers for dataset construction and inference

Python 22 2 Updated Feb 21, 2025

kehanlu / DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

HTML 88 5 Updated Feb 20, 2025

natolambert / rlhf-book

Textbook on reinforcement learning from human feedback

TeX 907 79 Updated May 15, 2025

ucbepic / docetl

A system for agentic LLM-powered data processing and ETL

Python 1,962 187 Updated May 21, 2025

Qualcomm-AI-research / bcresnet

Python 62 13 Updated May 31, 2023

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 267 33 Updated Mar 12, 2025

linjac / GenDARA

Python 12 1 Updated Jan 14, 2025

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,301 169 Updated Mar 28, 2025

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS…

C++ 6,058 696 Updated May 22, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 3,824 375 Updated May 21, 2025

akanametov / yolo-face

Forked from ultralytics/ultralytics

YOLO Face 🚀 in PyTorch

Python 456 41 Updated Mar 14, 2025

unixpickle / learn-ptx

Learning about CUDA by writing PTX code.

Python 129 4 Updated Feb 27, 2024

huutuongtu / skd-ctc

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Python 6 Updated Sep 25, 2024

FL33TW00D / coremlprofiler

Profile your CoreML models directly from Python 🐍

Python 27 3 Updated Oct 15, 2024

gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,751 194 Updated Jan 16, 2025

IINemo / lm-polygraph

Python 270 38 Updated May 14, 2025

rsprouse / xray_microbeam_database

Annotations and scripts for use with University of Wisconsin X-Ray Microbeam Speech Production Database (1994)

Jupyter Notebook 13 1 Updated Oct 8, 2020

facebookresearch / spiritlm

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 903 59 Updated Oct 28, 2024

Berkeley-Speech-Group / Speech-Articulatory-Coding

Jupyter Notebook 33 8 Updated Feb 7, 2025

interactiveaudiolab / penn

Pitch Estimating Neural Networks (PENN)

Python 253 24 Updated Apr 2, 2025

Starred topics

speech

vocoder

0