Audio capture → OpenAI Whisper → Markdown. That's it.
Records audio from your mic when you hit F8, stops when you hit F8 again, then transcribes it. First tries OpenAI's Whisper API, but automatically falls back to local faster-whisper if the API fails or files are too large. Saves transcriptions as markdown files with timestamps.
# Install the dependencies
pip install -r requirements.txt
# If PyAudio fails on Windows (it often does)
pip install pipwin
pipwin install pyaudio
# Add your OpenAI API key to a .env file
echo "OPENAI_API_KEY=your_actual_key_here" > .env
# Run it
python transcription.py
First time you run it, you'll need to pick your mic. It'll remember your choice in audio_config.json
.
- F8: Start/stop recording (you'll see audio levels in real-time)
- M: Show menu options
- ESC: Quit
The menu (press M) lets you:
- Start/stop recording (same as F8)
- View your saved transcriptions
- Toggle auto-opening files when done
- Exit
transcription.py
: Main program with smart API/local fallbackquick_transcribe.py
: Manual tool for processing failed recordingsdevice_finder.py
: Detects/selects audio input devicesaudio_config.json
: Saves which mic you're usingtranscription_config.json
: Audio settings and preferences.env
: Your OpenAI API keytranscriptions/
: Where your markdown files go
Edit transcription_config.json
if you want to change settings:
{
"format": 8, // Audio format (8 = PyAudio.paInt16)
"channels": 1, // Mono audio
"rate": 16000, // Sample rate (Hz)
"chunk": 1024, // Processing chunk size
"hotkey": "f8", // Key to start/stop recording
"output_dir": "transcriptions", // Where files get saved
"language": "en", // Language for transcription
"auto_open": false, // Automatically open files when saved
"min_duration": 1.0 // Minimum recording duration in seconds
}
If your mic isn't working:
- Run
python device_finder.py
to see all available audio devices - Delete
audio_config.json
to reset your device preference - Run the main script again and pick a different mic
- Records audio frames when F8 is pressed
- Shows audio levels while recording
- When F8 is pressed again, saves audio to temp WAV file
- Checks file size - if >25MB, uses local transcription
- Tries OpenAI Whisper API first
- If API fails, automatically falls back to local faster-whisper
- Formats the returned text and saves as markdown
- Shows transcription in terminal and saves to file
Smart fallback means it always works. If API is down, quota exceeded, or files too big - local whisper kicks in automatically.
If something goes wrong, failed recordings get saved as failed_recording_*.wav
in the transcriptions folder. Use quick_transcribe.py
to manually process them:
python quick_transcribe.py
This finds the most recent failed recording and transcribes it locally.
- Add desktop screen capture via live button
- maybe highlighter tool? or cut out section to add @ time stamp of cut
- ability to grab and link pages/apps/locations to text
- a better way to store all the transcripts?
- overlay that is invisible to any video/call app?