Whispering

A desktop application that allows you to record audio and transcribe it using OpenAI's Whisper model. The application runs locally and can be triggered using keyboard shortcuts.

Features

Record audio using keyboard shortcuts
Transcribe audio using Whisper.cpp
Automatic text pasting after transcription
Desktop notifications for recording status
Configurable audio settings and keyboard shortcuts

Installation

Manual Installation

Install Rust and Cargo from rustup.rs
(Optional) Install the required system dependencies:
- ALSA development libraries (for audio recording)
- X11 development libraries (for keyboard input)
- OpenSSL development libraries (for model downloading)

MacOS

Build and run:

cargo run --release --features metal

Linux

Build and run:

cargo run --release --features cuda,x11  # Untested
cargo run --release --features cuda,wayland

Using Nix

# Clone the repository
git clone https://github.com/yourusername/whispering.git
cd whispering

# Build and run using Nix
nix build

Configuration

The application can be configured using a TOML file located at ~/.config/whispering/config.toml. If no configuration file exists, the application will create one with default values.

Example Configuration

# Whispering Configuration Example
#
# Copy this file to ~/.config/whispering/config.toml and modify the values as needed.
# If no config.toml is found, the application will use default values.

[audio]
# Number of audio channels (1 for mono, 2 for stereo)
channels = 1
# Sample rate in Hz
sample_rate = 16000
# Bits per sample
bits_per_sample = 32
# Sample format (float or int)
sample_format = "float"

[model]
# Hugging Face model repository
repo = "ggerganov/whisper.cpp"
# Model filename
filename = "ggml-base.en.bin"
# Prompt
# Options are `type = "none", "vocabulary", "raw"
#  prompt = { type = "vocabulary", vocabulary = ["Google", "HuggingFace"] } 
#     will insert  "Google, HuggingFace" as an initial prompt.
# 
#  prompt = { type = "raw", prompt = "HuggingFace likes Google"] } 
#     will insert  "HuggingFace like Google" as an initial prompt.
# 
# For more information on whisper prompting:
# https://cookbook.openai.com/examples/whisper_prompting_guide
prompt = { type = "none" }
replacements = {}

[paths]
# Cache directory for storing temporary files
cache_dir = "~/.cache/whispering"
# Path to the recorded audio file
recording_path = "~/.cache/whispering/recorded.wav"

[shortcuts]
# Keys that need to be pressed in sequence to start recording
# Available keys: control, alt, shift, super, space, enter, r, and many others
keys = ["ControlLeft", "Space"] 
# Automatically hit enter after sending the text (so sends a message in usual contexts).
autosend = false

Configuration Options

Audio Settings

channels: Number of audio channels (1 for mono, 2 for stereo)
sample_rate: Sample rate in Hz (default: 16000)
bits_per_sample: Bits per sample (default: 32)
sample_format: Sample format ("float" or "int")

Model Settings

repo: Hugging Face model repository
filename: Model filename to download and use

Path Settings

cache_dir: Directory for storing temporary files
recording_path: Path to save recorded audio files

Shortcut Settings

keys: List of keys to press in sequence to start recording
- Available keys: control, alt, shift, super, space, enter, and many others
- Default: ["control", "space"]

Usage

Start the application
Press the configured shortcut keys (default: Control + Space) to start recording
Release the last key to stop recording
The transcribed text will be automatically pasted into the active window

Troubleshooting

Audio Issues

Ensure your microphone is properly connected and selected as the default input device
Check that your system's audio permissions are properly configured
Verify that the ALSA development libraries are installed

Keyboard Issues

Make sure the X11 development libraries are installed
Check that your window manager supports the keyboard shortcuts you've configured

Model Issues

Ensure you have sufficient disk space for the model (approximately 1.5GB)
Check your internet connection for model downloading
Verify that the model repository and filename are correct

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
.github/workflows		.github/workflows
nix		nix
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
config.example.toml		config.example.toml
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whispering

Features

Installation

Manual Installation

MacOS

Linux

Using Nix

Configuration

Example Configuration

Configuration Options

Audio Settings

Model Settings

Path Settings

Shortcut Settings

Usage

Troubleshooting

Audio Issues

Keyboard Issues

Model Issues

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Narsil/whispering

Folders and files

Latest commit

History

Repository files navigation

Whispering

Features

Installation

Manual Installation

MacOS

Linux

Using Nix

Configuration

Example Configuration

Configuration Options

Audio Settings

Model Settings

Path Settings

Shortcut Settings

Usage

Troubleshooting

Audio Issues

Keyboard Issues

Model Issues

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages