mt_stt

Marcel Timm, RhinoDevel, 2025

mt_stt is a C++ library for Linux and Windows that offers a pure C interface to the great speech-to-text inference engine Whisper.cpp by Georgi Gerganov that itself runs OpenAI Whisper models.

With mt_stt you can:

Transcribe from raw audio in memory to a string.
Use a model to be loaded from file or already held in memory.
Translate to English.
Add an optional initial prompt (to bias/help the transcription process).
Progress callback and cancel option.
Optionally transcribe a specific part of the audio data, only.
Output probabilities of the transcribed words (how sure the model is about the word representing the correct result).

How To

After cloning the mt_stt repository, enter its folder and proceed as follows.

Get the Whisper.cpp submodule content:

git submodule update --init --recursive

Linux

No details for Linux here, yet, but you can take a look at the Windows instructions below and at the Makefile.

Windows

All the following examples are building static libraries, there may be use cases where dynamically linked libraries are sufficient, too.

Build Whisper.cpp

Compile `whisper.lib` and `ggml.lib` as static libraries

Compile the necessary whisper.lib and ggml.lib libraries via Visual Studio and mt_stt/whisper.cpp/CMakeLists.txt as static libraries.

To do that, modify the file mt_stt/whisper.cpp/CMakePresets.json which is created by Visual Studio:

If the binary of git is not in your path, modify "configurePresets" entry with "name" "windows-base" by adding the following entry to "cacheVariables":

"GIT_EXE": "C:\\Program Files\\Git\\bin\\git.exe"

Add entry

{
  "name": "mt-x64-release-static",
  "displayName": "MT x64 Release Static (native)",
  "description": "MT: Target Windows (64-bit), static, with the Visual Studio development environment. (RelWithDebInfo)",
  "inherits": "x64-release",
  "cacheVariables": {
    "BUILD_SHARED_LIBS": "OFF"
  }
}

to mt_stt/whisper.cpp/CMakePresets.json's configurePresets array.

OpenBLAS build

Download OpenBLAS (e.g. OpenBLAS-0.3.29-x64.zip) and unpack the content to C:\openblas.

Additionally add entry

{
  "name": "mt-x64-release-static-blas",
  "displayName": "MT x64 Release Static BLAS",
  "description": "MT: Target Windows (64-bit), static, BLAS, with the Visual Studio development environment. (RelWithDebInfo)",
  "inherit
9D95
s": "mt-x64-release-static",
  "cacheVariables": {
    "GGML_BLAS": "ON",
    "BLAS_LIBRARIES": "C:/openblas/lib/libopenblas.lib",
    "BLAS_INCLUDE_DIRS": "C:/openblas/include"
  }
}

Put the libopenblas.dll (from C:\openblas\bin\libopenblas.dll) into the folder of the executable file that will be linked with THIS project's resulting DLL.

CUDA build

Working with (e.g.): CUDA 12.4.131 and Whisper.cpp v1.7.5

Additionally add entry

{
  "name": "mt-x64-release-static-cuda",
  "displayName": "MT x64 Release Static CUDA (native)",
  "description": "MT: Target Windows (64-bit), static, CUDA, with the Visual Studio development environment. (RelWithDebInfo)",
  "inherits": "mt-x64-release-static",
  "cacheVariables": {
    "GGML_CUDA": "ON"
  }
}

to mt_stt/whisper.cpp/CMakePresets.json's configurePresets array.

In mt_stt, link with these libraries (e.g. from C:\cuda\lib\):

x64\cublas.lib
x64\cuda.lib
x64\cudart.lib

Put the following files (e.g. from C:\cuda\bin) into the folder of the executable file that will be linked with this project's resulting DLL:

cublas64_12.dll
cublasLt64_12.dll
cudart64_12.dll

On a non-development PC, make sure that the most recent Nvidia drivers are installed (they include CUDA support).

Build for non-AVX processors (e.g. Celeron)

Additionally add entry

{
  "name": "mt-x64-release-static-sse",
  "displayName": "MT x64 Release Static SSE",
  "description": "MT: Target Windows (64-bit), static, SSE, with the Visual Studio development environment. (RelWithDebInfo)",
  "inherits": "mt-x64-release-static",
  "cacheVariables": {
    "GGML_NATIVE": "OFF",
    "GGML_AVX": "OFF",
    "GGML_AVX2": "OFF"
  }
}

to mt_stt/whisper.cpp/CMakePresets.json's configurePresets array.

and change the line

#if defined(_MSC_VER) && (defined(__AVX__) || defined(__AVX2__) || defined(__AVX512F__))

to

#if defined(_MSC_VER)// && (defined(__AVX__) || defined(__AVX2__) || defined(__AVX512F__))

in the file

mt_stt/whisper.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h

before the line

#ifndef __SSE3__

to enable SSE3 and SSSE3.

Build for non-AVX processors (e.g. Celeron), with OpenBLAS

Download OpenBLAS (e.g. OpenBLAS-0.3.29-x64.zip) and unpack the content to C:\openblas.

Additionally add entry (also don't forget ggml-cpu-impl.h - see above)

{
  "name": "mt-x64-release-static-sse-blas",
  "displayName": "MT x64 Release Static SSE and BLAS",
  "description": "MT: Target Windows (64-bit), static, SSE, BLAS, with the Visual Studio development environment. (RelWithDebInfo)",
  "inherits": "mt-x64-release-static-sse",
  "cacheVariables": {
    "GGML_BLAS": "ON",
    "BLAS_LIBRARIES": "C:/openblas/lib/libopenblas.lib",
    "BLAS_INCLUDE_DIRS": "C:/openblas/include"
  }
}

Put the libopenblas.dll (from C:\openblas\bin\libopenblas.dll) into the folder of the executable file that will be linked with this project's resulting DLL.

Build mt_stt

Open solution mt_stt.sln with Visual Studio (tested with 2022).
Compile in release or debug mode.

Test mt_stt

The sample code below is using mt_tts, which is kind of the counterpart to this project.
Follow Test mt_tts first.
Get the DLL and LIB files resulting from building this project, e.g. for release mode x64\Release\mt_stt.dll and x64\Release\mt_stt.lib, copy them to the folder from Test mt_tts.
Also copy the file mt_stt\mt_stt.h to that folder.
Copy a Whisper(.cpp) model file that supports translation to English to the same new folder.
Open x64 Native Tools Command Prompt for VS 2022 commandline.
Go to the example folder and put the following code into the already existing file main.c:

#include <stdio.h>
#include <stdlib.h>

#include "mt_tts.h"
#include "mt_stt.h"

/** Example use of mt_stt transcribing & translating German language audio to
 *  text in English.
 *
 *  The audio is generated first with mt_tts.
 */
int main(void)
{
    int16_t* tts_result = NULL;
    int sample_count = -1;
    float* stt_input = NULL;
    char* stt_result = NULL;

    // *************************************************************************
    // *** TTS: Create raw audio data from a text given in German:           ***
    // *************************************************************************

    // Initialize TTS system with a model/voice for output in German:
    mt_tts_reinit("de_DE-thorsten-high.onnx", "de_DE-thorsten-high.onnx.json");

    // Get the actual raw audio data:
    tts_result = mt_tts_to_raw(
        "Hallo! Dies ist ein Text in deutscher Sprache. Erst wird er in ein Tonsignal umgewandelt, welches dann wiederum in Text transkribiert wird, jedoch nun auf Englisch.",
        &sample_count);

    // Convert the audio data into normalized floating-point representation:

    stt_input = malloc(sample_count * sizeof *stt_input);

    for(int i = 0; i < sample_count; ++i)
    {
        stt_input[i] = (float)tts_result[i] / 16384.0f;
    }

    // Free memory and de-initialize TTS system:

    mt_tts_free_raw(tts_result);
    tts_result = NULL;

    mt_tts_deinit();
    
    // *************************************************************************
    // *** STT: Transcribe the audio while also translating it to English:   ***
    // *************************************************************************
    
    stt_result = mt_stt_transcribe_with_file(
        false,
        4,
        NULL,
        true,
        NULL,
        "ggml-small-q5_1.bin",
        stt_input,
        sample_count,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        0);

    // Output the translated transcription of the spoken text:
    printf("%s\n", stt_result);

    // Free memory and exit:
    free(stt_result);
    stt_result = NULL;
    return 0;
}

Compile via cl main.c mt_tts.lib mt_stt.lib.
Run main.exe, which should show the transcription/translation result.

Notes

Install Microsoft Visual C++ Redistributable Version for Visual Studio 2015, 2017, 2019, and 2022 (e.g. version 14.42.34433.0).

Name		Name	Last commit message	Last commit date
Latest commit < 8000 div class="d-flex gap-2"> History 8 Commits
mt_stt		mt_stt
.gitignore		.gitignore
.gitmodules		.gitmodules
CREDITS.txt		CREDITS.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
mt_stt.sln		mt_stt.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mt_stt

How To

Linux

Windows

Build Whisper.cpp

Compile `whisper.lib` and `ggml.lib` as static libraries

OpenBLAS build

CUDA build

Build for non-AVX processors (e.g. Celeron)

Build for non-AVX processors (e.g. Celeron), with OpenBLAS

Build mt_stt

Test mt_stt

Notes

About

Uh oh!

Releases

Uh oh!

Languages

License

RhinoDevel/mt_stt

Folders and files

Latest commit

History

Repository files navigation

mt_stt

How To

Linux

Windows

Build Whisper.cpp

Compile whisper.lib and ggml.lib as static libraries

OpenBLAS build

CUDA build

Build for non-AVX processors (e.g. Celeron)

Build for non-AVX processors (e.g. Celeron), with OpenBLAS

Build mt_stt

Test mt_stt

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages

Compile `whisper.lib` and `ggml.lib` as static libraries