Stars
No fortress, purely open ground. OpenManus is Coming.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Pytorch implementation of paper "High Fidelity Speech Regeneration With Application to Speech Enhancement"