8000 GitHub - dahshury/WinSTT: A windows app to type by using a customizable hotkey utilizing OpenAI's whisper and a nice GUI
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

A windows app to type by using a customizable hotkey utilizing OpenAI's whisper and a nice GUI

License

Notifications You must be signed in to change notification settings

dahshury/WinSTT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alt text WinSTT

Alt text

An application for desktop STT using OpenAI-Whisper

Type in any application using your voice. WinSTT is an application that leverages the power of OpenAI's Whisper STT model for efficient voice typing functionality. This desktop tool allows users to transcribe speech into text, with support for over 99 languages and the capability to run locally without the need for an internet connection.

Why

Existing Windows speech to text is slow, not accurate, and not intuitive. This app provides customizable hotkey activation, and fast and accurate transcription for rapid typing. This is especially useful to those who write articles, blogs, and even conversations.

Setup

Precompiled Binary (Recommended for Windows Users)

  • Download the .exe file from the latest release from the Releases section .

Python Version Setup

Install Dependencies

  • First, clone the repo:

    git clone https://github.com/dahshury/WinSTT
  • Navigate to the cloned directory:

    cd WinSTT
  • Initialize the environment and install the requirements:

    CPU VERSION
    conda env create -f env.yaml
    GPU VERSION
    conda env create -f env-gpu.yaml
    Linux users only: additional setup for PyAudio

    For Linux, you need to install PortAudio, which PyAudio depends on. Use the following commands to install PortAudio on common Linux distributions:

    • Debian/Ubuntu:
      sudo apt update
      sudo apt install portaudio19-dev libxcb1 libxcb-cursor0 libxcb-keysyms1 libxcb-render0 libxcb-shape0 libxcb-shm0 libxcb-xfixes0 libxcb-icccm4 libxcb-image0 libxcb-sync1 libxcb-xinerama0 libxcb-randr0 libxcb-util1 libx11-xcb1 libxrender1 libxkbcommon-x11-0
  • Activate the environment:

    conda activate WinSTT

Start The App

  • Start the GUI by running the bash command:
python winSTT.py
  • alternatively, you can use the python script listener.py, which contains the default functionality:
python -m utils.listener

Usage

Hold the Alt+Ctrl+A key combination to start recording, release it to stop. There can be a very slight delay between the start of the pressing and the start of the app listening to the audio from your microphone. You should only start speaking after hearing the audio cue.

  • Releasing the key will transcribe the audio you recorded, paste it wherever your typing pointer is in any application. The processing speed will depend on the model chosen and your computer capabilities.

  • The app contains a "record key" button, which allows you to change the recording key that you have to hold to start recording. Press record key, and then press and hold the buttons you wish to start the recording with, then click stop to change the recording key.

  • This tool is powered by Hugging Face's ASR models, primarily Whisper by OpenAI. The larger the model, the better the accuracy and the slower the speed. Try the model that best suits your hardware and needs.

Notes

  • Upon loading the app for the first time, Please wait for the model files to be downloaded, (about 1 GB for CPU version, 3 GB for GPU version) this will depend on your internet connection. After the model is downloaded, no internet connection needed unless you change the model. After that, the first recording might be pasted a little bit slower than the consequent ones.
  • The app will automatically detect if audio is present in the speech. If not, or if an error occurs, it will output a message inside the app and inside the logs folder.
  • The application only records while the record key is held down.
  • You can use this app using a CPU, it will run Whisper-Turbo quantized by default. However, if you have a CUDA GPU, the app will run the full version and this will increase the speed and the accuracy and is highly recommended.
  • The application does not transcribe audio that is less than 0.5 second long. If your sentence is short, consider not letting go of the button until 0.5s has passed.
  • Some antivirus programs may flag .exe files generated by PyInstaller as current releases as suspicious. This is a known issue. Rest assured, the binaries are clean and safe. The app has passed most VirusTotal's tests, which you can check out here, the rest are false positives.

Acknowledgments

About

A windows app to type by using a customizable hotkey utilizing OpenAI's whisper and a nice GUI

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

0