A wrapper for Hugging Face sentence transformer models with an OpenAI-compatible API.
This is a wrapper for Hugging Face sentence transformer models. FastAPI is used to implement an HTTP API. The API is compatible with the OpenAI API for text embeddings. A Dockerfile is included to build an image based on Uvicorn with the CPU-only version of PyTorch.
The models are loaded based on the given MODELS
environment variable.
Multiple models can be given in a comma-separated string.
The following command will run the latest prebuilt image:
docker run -it --rm -p 8080:80 ghcr.io/bergos/embedding-server:latest
Open http://localhost:8080/docs in your browser to open the UI to browser the API.
The Dockerfile loads the models during build time. No persistence is required.
To build the image run:
docker build -t embedding-server .
And to spin up a local instance on port 8080:
docker run -it --rm -p 8080:80 embedding-server