Speech Note

Linux desktop and Sailfish OS app for note taking, reading and translating with offline Speech to Text, Text to Speech and Machine Translation

Contents of this README

Description
Languages and Models
How to install
Flatpak packages
Beta version
Extra features
Building from sources
How to enable a custom model
Contributing to Speech Note
How to support
Reviews and demos
License

Description

Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your computer, without using a network connection. Your privacy is always respected. No data is sent to the Internet.

Speech Note uses many different processing engines to do its job. Currently these are used:

Speech to Text (STT)
Text to Speech (TTS)
- espeak-ng
- MBROLA
- Piper
- RHVoice
- Coqui TTS
- Mimic 3
- WhisperSpeech
- Kokoro
- Parler-TTS
- F5-TTS
- S.A.M.
Machine Translation (MT)
- Bergamot Translator

Languages and Models

Speech Note installation package does not include checkpoint files for supported models, but instead they can be easily downloaded using the graphical model browser built into the application.

Following languages and models are supported and enable for download:

Lang ID	Name	DeepSpeech (STT)	Whisper (STT)	Vosk (STT)	April-ASR (STT)	Piper (TTS)	RHVoice (TTS)	espeak (TTS)	MBROLA (TTS)	Coqui (TTS)	Mimic3 (TTS)	WhisperSpeech (TTS)	Kokoro (TTS)	F5-TTS	Parler-TTS	S.A.M. (TTS)	Bergamot (MT)
af	Afrikaans		●					●			●
am	Amharic	● (e)	●					●		●
ar	Arabic		●	●		●		●	●	●							●
bg	Bulgarian		●					●		●
bn	Bengali		●					●		●	●
bs	Bosnian		●					●									●
ca	Catalan	●	●	●		●		●		●							●
cs	Czech	●	●	●		●	●	●	●	●							●
cy	Welsh					●
da	Danish		●			●		●		●							●
de	German	●	●	●		●		●		●	●	●			●(e)		●
el	Greek	● (e)	●			●		●		●	●						●
en	English	●	●	●	●	●	●	●		●	●	●	●	●	●	●	●
eo	Esperanto			●			●	●
es	Spanish	●	●	●		●	●	●		●	●	●	●		●(e)		●
et	Estonian	● (e)	●					●	●	●							●
eu	Basque	● (e)	●					●		●
fa	Persian	●	●	●		●		●	●	●	●						●
fi	Finnish	●	●			●		●		●	●						●
fr	French	●	●	●	●	●		●		●	●	●	●		●(e)		●
ga	Irish							●		●
gu	Gujarati		●					●			●
ha	Hausa		●								●
he	Hebrew		●							●
hi	Hindi		●	●				●					●
hr	Croatian		●				●	●	●	●
hu	Hungarian	● (e)	●			●		●	●	●	●						●
id	Indonesian	● (e)	●					●	●	●							●
is	Icelandic		●			●		●		●							●
it	Italian	●	●	●		●		●		●	●	●	●		●(e)		●
ja	Japanese		●	●				●		●			●				●
jv	Javanese		●								●
ka	Georgian		●			●	●	●
kk	Kazakh		●	●		●		●		●
ko	Korean		●	●				●		●							●
ky	Kyrgyz						●	●
la	Latin							●		●
lb	Luxembourgish					●
lt	Lithuanian		●					●	●	●							●
lv	Latvian	●	●			●		●		●							●
mk	Macedonian		●				●	●
mn	Mongolian	● (e)	●							●
mr	Marathi		●							●
ms	Malay		●					●	●	●
mt	Maltese		●					●		●
ne	Nepali		●			●		●			●
nl	Dutch	● (e)	●	●		●		●		●	●	●			●(e)		●
no	Norwegian		●			●		●									●
pl	Polish	●	●	●	●	●	●	●	●	●	●	●			●(e)		●
pt	Portuguese	● (e)	●	●		●		●	●	●			●		●(e)		●
ro	Romanian	● (e)	●			●		●	●	●							●
ru	Russian	●	●	●		●	●	●			●						●
sk	Slovak		●			●	●	●		●							●
sl	Slovenian	● (e)	●			●		●		●							●
sq	Albanian		●				●	●		●
sr	Serbian		●			●	●	●									●
sv	Swedish		●	●		●		●	●	●	●						●
sw	Swahili	●	●			●		●		●
te	Telugu		●					●			●
th	Thai	● (e)	●					●		●
tl	Tagalog		●	●						●
tn	Tswana		●					●			●
tr	Turkish	● (e)	●	●		●		●	●	●							●
tt	Tatar		●				●	●		●
uk	Ukrainian	●	●	●		●	●	●		●	●						●
uz	Uzbek		●	●				●		●
vi	Vietnamese		●	●		●		●		●							●
yo	Yoruba	● (e)	●							●	●
zh	Chinese	●	●	●		●		●		●			●	●			●

^{(e) experimental, most likely doesn't work well}

Faster Whisper, Coqui TTS and Mimic3 models are only available on x86-64.

Language models can be downloaded directly from the app.

Details of models which are currently configured for download are described in models.json (GitHub) or models.json (GitLab).

How to install

Linux Desktop: Flatpak

# Flatpak base package
flatpak install net.mkiol.SpeechNote

# Optional NVIDIA add-on package
flatpak install net.mkiol.SpeechNote.Addon.nvidia

# Optional AMD add-on package
flatpak install net.mkiol.SpeechNote.Addon.amd

Arch Linux (AUR):
- dsnote
- dsnote-git
openSUSE (Packman repository)

# Base package
zypper in speechnote

# Optional support for Python-based features in Speech Note
zypper in speechnote-python-modules

Sailfish OS: OpenRepos

Flatpak packages

The app distributed via Flatpak (published on Flathub) consists of the following packages:

Base package "Speech Note" (net.mkiol.SpeechNote)
Optional add-on for NVIDIA graphics card "Speech Note NVIDIA" (net.mkiol.SpeechNote.Addon.nvidia)
Optional add-on for AMD graphics card "Speech Note AMD" (net.mkiol.SpeechNote.Addon.amd)

Base package includes all the dependencies needed to run every feature of the application. Add-ons add the capability of GPU acceleration, which speeds up some operations in the application.

Base package and add-ons contain many "heavy" libraries like CUDA, ROCm, Torch and Python libraries. Due to this, the size of the packages and the space required after installation are significant. If you don't need all the functionalities, you can use much smaller "Tiny" package (available on Releases page), which provides only the basic features. If you need, you can also use "Tiny" packages together with GPU acceleration add-on.

Comparison between Base, Tiny and Add-ons Flatpak packages:

Sizes	Base	Tiny	AMD add-on	NVIDIA add-on
Download size	1.2 GiB	48 MiB	+7.6 GiB	+4.3 GiB
Unpacked size	3.6 GiB	170 MiB	+25.6 GiB	+6.6 GiB

Features	Base	Tiny	AMD add-on	NVIDIA add-on
Coqui/DeepSpeech STT	+	+
Vosk STT	+	+
Whisper (whisper.cpp) STT	+	+
Whisper (whisper.cpp) STT OpenCL ROCm	-	-	+
Whisper (whisper.cpp) STT OpenCL NVIDIA	+	+
Whisper (whisper.cpp) STT ROCm	-	-	+
Whisper (whisper.cpp) STT CUDA	-	-		+
Whisper (whisper.cpp) STT OpenVINO	+	-
Whisper (whisper.cpp) STT Vulkan	+	+
FasterWhisper STT	+	-
FasterWhisper STT CUDA	-	-		+
April-ASR STT	+	+
eSpeak TTS	+	+
MBROLA TTS	+	+
Piper TTS	+	+
RHVoice TTS	+	+
Coqui TTS	+	-
Coqui TTS ROCm	-	-	+
Coqui TTS CUDA	-	-		+
Mimic3 TTS	+	-
WhisperSpeech TTS	+	-
WhisperSpeech TTS ROCm	-	-	+
WhisperSpeech TTS CUDA	-	-		+
Kokoro TTS	+	-
Kokoro TTS ROCm	-	-	+
Kokoro TTS CUDA	-	-		+
Parler-TTS	+	-
Parler-TTS ROCm	-	-	+
Parler-TTS CUDA	-	-		+
F5-TTS	+	-
F5-TTS ROCm	-	-	+
F5-TTS CUDA	-	-		+
S.A.M TTS	+	+
Punctuation restoration	+	-
Translator	+	+

Beta version

In addition to the stable version in the Flathub repository, you can try to test the "Beta" version of the upcoming release. This version is usable, but may contain more bugs.

Beta version is available in "flathub-beta" repository. Follow these instructions to enable flathub-beta on your computer.

Extra features

Command-line options

The CLI interface is primary for the integration with the desktop when Speech Note is already running (for example, hidden in the system tray or in the backgraund).

Examples

List all supported options:

flatpak run net.mkiol.SpeechNote --help

Start listening:

flatpak run net.mkiol.SpeechNote --action start-listening

Cancel any already started action:

flatpak run net.mkiol.SpeechNote --action cancel

Start listening, the decoded text will be saved to the clipboard:

flatpak run net.mkiol.SpeechNote --action start-listening-clipboard

Start listening, the decoded text will be inserted into any window on the desktop on which the cursor is focused:

flatpak run net.mkiol.SpeechNote --action start-listening-active-window

Start reading "Hello, how are you doing?":

flatpak run net.mkiol.SpeechNote --action start-reading-text --text "Hello, how are you doing?"

Save speech of "Hello, how are you doing?" to "speech.mp3" file:

flatpak run net.mkiol.SpeechNote --action start-reading-text --text "Hello, how are you doing?" --output-file speech.mp3

List all available TTS models:

flatpak run net.mkiol.SpeechNote --print-available-models tts

Global keyboard shortcuts

Global keyboard shortcuts allow you to start listening or reading with the keyboard, even when the application is not active (e.g. minimized, hidden in the system tray icon or just in the background).

To enable and customize keyboard shortcuts, go to Settings->Accessibility->Use Global Keyboard Shortcuts.

In order for shortcuts to work under Wayland, your desktop environment must support GlobalShortcuts interface in the XDG Desktop Portal service. Currently, GlobalShortcuts is supported only in the latest KDE Plasma and GNOME desktops.

When XDG Desktop Portal is used to manage global shortcuts, use the desktop environment tool to customize key bindings.

Insert into active window

Using global keyboard shortcuts or command-line actions, you can directly start listening and insert the decoded text into any window that is currently in focus. This allows you to use Speech Note as a voice typing tool on the desktop.

Under X11, this feature should work right out of the box.

Under Wayland, the external ydotool daemon must be installed and running for it to work. If you are using Flatpak, also make sure that the application has permission to access the ydotool daemon's socket file and the socket file.

Building from sources

Arch Linux

It is also possible to build and install the latest development (git) or latest stable (release) version from the repository using the provided PKGBUILD file (please note that the same remarks about building on Linux apply):

git clone <git repository url>

cd dsnote/arch/git      # build latest git version
# or
cd dsnote/arch/release  # build latest release version

makepkg -si

RHEL/Fedora/Rocky Linux

It is also possible to build and install the latest development version from the repository using the provided SPEC file and helper make_rpm.sh script:

git clone <git repository url>

cd dsnote/fedora

# optionally install build dependencies
dnf install rpmdevtools autoconf automake boost-devel cmake git kf5-kdbusaddons-devel libarchive-devel libxdo-devel libXinerama-devel libxkbcommon-x11-devel libXtst-devel libtool meson openblas-devel patchelf pybind11-devel python3-devel python3-pybind11 qt5-linguist qt5-qtmultimedia-devel qt5-qtquickcontrols2-devel qt5-qtx11extras-devel rubberband-devel taglib-devel vulkan-headers

./make_rpm.sh

Flatpak

git clone <git repository url>

cd dsnote/flatpak

# build a base package
flatpak-builder --force-clean --user --install-deps-from=flathub --repo="<name or /path/to/local/flatpak/repo>" "/path/to/output/dir" net.mkiol.SpeechNote.yaml

# build an optional NVIDIA add-on package
flatpak-builder --force-clean --user --install-deps-from=flathub --repo="<name or /path/to/local/flatpak/repo>" "/path/to/output/dir" net.mkiol.SpeechNote.Addon.nvidia.yaml

Sailfish OS

git clone <git repository url>

cd dsnote
mkdir build
cd build

sfdk config --session specfile=../sfos/harbour-dsnote.spec
sfdk config --session target=SailfishOS-4.4.0.58-aarch64
sfdk cmake ../ -DCMAKE_BUILD_TYPE=Release -DWITH_SFOS=ON -DWITH_PY=OFF
sfdk package

Linux (direct build)

Speech Note has many build-time and run-time dependencies. This includes shared and static libraries, 3rd-party executables, Python and Perl scripts. Because of these complexity, the recommended way to build is to use Flatpak tool-chain (Flatpak manifest file and flatpak-builder). If you want to make a direct build (i.e. without flatpak) it is also possible but more complicated.

git clone <git repository url>

cd dsnote
mkdir build
cd build

cmake ../ -DCMAKE_BUILD_TYPE=Release -DWITH_DESKTOP=ON
make

To make build without support for Python components, add -DWITH_PY=OFF in cmake step.

To see other build options search for option(BUILD_XXX) in CMakeList.txt file.

How to enable a custom model

All models available for download are specified in the configuration file (config/models.json). To enable a custom model that is compatible with currently supported engines, simply edit this file and restart the application.

When you first run the application, the models configuration file is created in:

~/.local/share/net.mkiol/dsnote/models.json, or
~/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote/models.json (Flatpak), or
~/.local/share/org.mkiol/dsnote/models.json (Sailfish OS)

You can freely edit currently enabled models or add new ones.

Model definition looks like this:

{
    "name": "<model name>",
    "model_id": "<model unique id>",
    "engine": "<engine type>",
    "lang_id": "<lang id>",
    "checksum": "<md5 checksum>",
    "checksum_quick": "<partial md5 checksum>",
    "comp": "<compression type",
    "urls": [
        <model URLs>
    ],
    "size": "<download size of all files>"
}

Allowed engine types: stt_ds, stt_vosk, stt_april, stt_whisper, stt_fasterwhisper, tts_piper, tts_rhvoice, tts_espeak, tts_coqui, tts_mimic3, mnt_bergamot

Allowed compression types: none, gz, xz, tarxz, targz, zip, zipall, dir, dirgz

Allowed URL types: http, https, file

Checksums are calculated for all files after unpacking. If you are adding a new model, you can use the --gen-checksums command line option to find the right checksums. To do this, put empty strings in both checksum and checksum_quick, save the file and run Speech Note with the mentioned option.

For example:

{
    "name": "New Piper Voice",
    "model_id": "en_piper_new",
    "engine": "tts_piper",
    "lang_id": "en",
    "checksum": "",
    "checksum_quick": "",
    "size": ""
    "comp": "dir",
    "urls": [
        "file:///home/me/models/new-model-medium.onnx",
        "file:///home/me/models/new-model-medium.onnx.json"
    ]
}

flatpak run net.mkiol.SpeechNote --verbose --gen-checksums

Contributing to Speech Note

Any contribution is very welcome!

Project is hosted both on GitHub and GitLab. Feel free to make a PR/MR, report an issue or reqest for new feature on the platform you prefer the most.

Translation

Translation files in Qt format are in translations directory.

Preferred way to contribute translation is via Transifex service, but if you would like to make a direct PR/MR, please do it.

How to support

If you find Speech Note useful and would like to support this project, please consider doing one or two of the following:

Give a ⭐ on GitHub or/and GitLab.
Write a review in your applications manager app (Discover, Software or any other).
Tell others about this app by mentioning it on social media.
If you have spare money, make a small donation via ko-fi (one time) or Liberapay (recurring).

Libraries

Speech Note relies on following open source projects:

Reviews and demos

Speech Note 4.8 changes video (Speech Note 4.8)
Speech Note 4.7 changes video (Speech Note 4.7)
Speech Note 4.6 changes video (Speech Note 4.6)
Speech Note 4.5 changes video (Speech Note 4.5)
Screenshots (Speech Note 4.5)
Thejesh GN blog (Speech Note 4.7)
LinuxD0 video (Speech Note 4.7, Spanish)
Guia Linux video (Speech Note 4.7, Portuguese)
lwn.net (Speech Note 4.6)
Softpedia (Speech Note 4.6)
OSTechNix (Speech Note 4.6)
Best FREE Speech-to-Text For Linux Mint video (Speech Note 4.6)
Marco's Box (Speech Note 4.4, Italian)
Marco's Box video (Speech Note 4.4, Italian)
alternativalinux (Speech Note 4.4, Italian)
alternativalinux video (Speech Note 4.4, Italian)
ZDNET (Speech Note 4.2)
Translator feature video demo on Sailfish OS (Speech Note 4.0)
Translator feature video demo on PinePhone (Speech Note 4.0)
DebugPoint.com (Speech Note 4.0)
DebugPoint.com video (Speech Note 4.0)
OMG! Linux (Speech Note 4.0)
LinuxLinks (Speech Note 4.0)
The Linux Cast video (Speech Note 4.0)
CONNECTwww.com (Speech Note 4.0)

License

Speech Note is an open source project. Source code is released under the Mozilla Public License Version 2.0.

3rd party libraries:

Coqui STT, released under the Mozilla Public License Version 2.0
Coqui TTS, released under the Mozilla Public License Version 2.0
Vosk API, released uder the Apache License 2.0
whisper.cpp, released 8BFA under the MIT License
WebRTC, released under this license
libarchive, released under the BSD License
RNNoise-nu, released under the BSD 3-Clause License
{fmt}, released uder this license
Hugging Face Transformers, released under the Apache License 2.0
Piper, released under the MIT License
RHVoice, released under the GNU General Public License v2.0
ssplit-cpp, released under the Apache License 2.0
espeak-ng, released under the GNU General Public License v3.0
bergamot-translator, released under the Mozilla Public License 2.0
Rubber Band Library, released under the GNU General Public License (version 2 or later)
simdjson, released under the Apache License 2.0
Nlohmann JSON, released under the MIT License
uroman, released under this license
astrunc, released under the MIT License
FFmpeg, released under the GNU Lesser General Public License version 2.1 or later
LAME, released under the LGPL
Vorbis, released under this license
TagLib, released under the GNU Lesser General Public License (LGPL) and Mozilla Public License (MPL)
libnumbertext, released under the BSD License
KDBusAddons, released under the LGPL licenses
QHotkey, released under the BSD-3-Clause License
faster-whisper, released under the MIT License
Mimic 3, released under the AGPL-3.0 license
Unikud, released under the MIT License
april-asr, released under the GNU General Public License v3.0
libopus, released under this license
html2md, released under the MIT License
maddy, released under the MIT License
WhisperSpeech, released under the MIT License
Kokoro, released under the Apache License 2.0
Parler-TTS, released under the Apache License 2.0
F5-TTS, released under the MIT License

The files in the directory nonbreaking_prefixes were copied from mosesdecoder project and distributed under the GNU Lesser General Public License v2.1.

Name		Name	Last commit message	Last commit date
Latest commit History 1,480 Commits
arch		arch
cmake		cmake
config		config
dbus		dbus
deb		deb
desktop		desktop
fedora		fedora
flatpak		flatpak
nonbreaking_prefixes		nonbreaking_prefixes
patches		patches
resources		resources
sfos		sfos
src		src
systemd		systemd
tests		tests
tools		tools
translations		translations
.clang-format		.clang-format
.gitlab-ci.yml		.gitlab-ci.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
config.h.in		config.h.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech Note

Contents of this README

Description

Languages and Models

How to install

Flatpak packages

Beta version

Extra features

Command-line options

Examples

Global keyboard shortcuts

Insert into active window

Building from sources

Arch Linux

RHEL/Fedora/Rocky Linux

Flatpak

Sailfish OS

Linux (direct build)

How to enable a custom model

Contributing to Speech Note

Translation

How to support

Libraries

Reviews and demos

License

About

Uh oh!

Releases 22

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

mkiol/dsnote

Folders and files

Latest commit

History

Repository files navigation

Speech Note

Contents of this README

Description

Languages and Models

How to install

Flatpak packages

Beta version

Extra features

Command-line options

Examples

Global keyboard shortcuts

Insert into active window

Building from sources

Arch Linux

RHEL/Fedora/Rocky Linux

Flatpak

Sailfish OS

Linux (direct build)

How to enable a custom model

Contributing to Speech Note

Translation

How to support

Libraries

Reviews and demos

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages