KServe

KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.

It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KServe is being used across various organizations.

Architecture Review

Control Plane and Data Plane

Core Features and Examples

Features and Examples

Learn More

To learn more about KServe, how to deploy it as part of Kubeflow, how to use various supported features, and how to participate in the KServe community, please follow the KServe website documentation. Additionally, we have compiled a list of presentations and demoes to dive through various details.

Prerequisites

Kubernetes 1.17 is the minimally recommended version, Knative Serving and Istio should be available on Kubernetes Cluster.

Istio: v1.9.0+
- KServe currently only depends on Istio Ingress Gateway to route requests to inference services externally or internally. If you do not need Service Mesh, we recommend turning off Istio sidecar injection.
Knative Serving: v0.19.0+
- If you are running Service Mesh mode with Authorization please follow knative doc to setup the authorization policies.
- If you are looking to use PodSpec fields such as nodeSelector, affinity or tolerations which are now supported in the v1beta1 API spec, you need to turn on the corresponding feature flags in your Knative configuration.
Cert Manager: v1.3.0+
- Cert manager is needed to provision webhook certs for production grade installation, alternatively you can run our self signed certs generation script.

Installation

Standalone Installation

KServe can be installed standalone if your kubernetes cluster meets the above prerequisites and is deployed in kserve namespace.

TAG=v0.7.0-rc0

Install KServe CRD and Controller

Due to a performance issue applying deeply nested CRDs, please ensure that your kubectl version fits into one of the following categories to ensure that you have the fix: >=1.16.14,<1.17.0 or >=1.17.11,<1.18.0 or >=1.18.8.

kubectl apply -f https://github.com/kserve/kserve/releases/download/$TAG/kserve.yaml

Quick Install (On your local machine)

Make sure you have kubectl installed.

If you do not have an existing kubernetes cluster, you can create a quick kubernetes local cluster with kind.

Note that the minimal requirement for running KServe is 4 cpus and 8Gi memory, so you need to change the docker resource setting to use 4 cpus and 8Gi memory.

kind create cluster

alternatively you can use Minikube

minikube start --cpus 4 --memory 8192

Install Istio lean version, Knative Serving, KServe all in one.(this takes 30s)

./hack/quick_install.sh

Setup Ingress Gateway

If the default ingress gateway setup does not fit your need, you can choose to setup a custom ingress gateway

Configure Custom Ingress Gateway
- In addition you need to update configmap to use the custom ingress gateway.
Configure Custom Domain
Configure HTTPS Connection

Determine the ingress IP and ports

Execute the following command to determine if your kubernetes cluster is running in an environment that supports external load balancers

$ kubectl get svc istio-ingressgateway -n istio-system
NAME                   TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)   AGE
istio-ingressgateway   LoadBalancer   172.21.109.129   130.211.10.121   ...       17h

If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway.

export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')

If the EXTERNAL-IP value is none (or perpetually pending), your environment does not provide an external load balancer for the ingress gateway. In this case, you can access the gateway using the service’s node port.

# GKE
export INGRESS_HOST=worker-node-address
# Minikube
export INGRESS_HOST=
8000
$(minikube ip)
# Other environment(On Prem)
export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}')

export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')

Alternatively you can do Port Forward for testing purpose

INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80
# start another terminal
export INGRESS_HOST=localhost
export INGRESS_PORT=8080

Test Installation

Verify installation

kubectl get po -n kserve
NAME                             READY   STATUS    RESTARTS   AGE
kserve-controller-manager-0   2/2     Running   2          13m

Create test inference service

Please follow getting started to create your first InferenceService.

Run Performance Test

# use kubectl create instead of apply because the job template is using generateName which doesn't work with kubectl apply
kubectl create -f docs/samples/${API_VERSION}/sklearn/v1/perf.yaml -n kserve-test
# wait the job to be done and check the log
kubectl logs load-test8b58n-rgfxr -n kserve-test
Requests      [total, rate, throughput]         30000, 500.02, 499.99
Duration      [total, attack, wait]             1m0s, 59.998s, 3.336ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.743ms, 2.748ms, 2.494ms, 3.363ms, 4.091ms, 7.749ms, 46.354ms
Bytes In      [total, mean]                     690000, 23.00
Bytes Out     [total, mean]                     2460000, 82.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:30000
Error Set:

Setup Monitoring

Use KServe SDK

Install the SDK
```
pip install kserve
```
Check the SDK documents from here.
Follow the example(s) here to use the KServe SDK to create, rollout, promote, and delete an InferenceService instance.

Presentations and Demoes

Roadmap

Contributor Guide

Adopters

Name		Name	Last commit message	Last commit date
Latest commit History 812 Commits
.github		.github
cmd		cmd
config		config
docs		docs
hack		hack
install		install
pkg		pkg
python		python
release		release
test		test
third_party/library		third_party/library
tools/tf2openapi		tools/tf2openapi
.gitignore		.gitignore
ADOPTERS.md		ADOPTERS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
PROJECT		PROJECT
README.md		README.md
ROADMAP.md		ROADMAP.md
agent.Dockerfile		agent.Dockerfile
go.mod		go.mod
go.sum		go.sum
prow_config.yaml		prow_config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KServe

Architecture Review

Core Features and Examples

Learn More

Prerequisites

Installation

Standalone Installation

Quick Install (On your local machine)

Setup Ingress Gateway

Determine the ingress IP and ports

Test Installation

Verify installation

Create test inference service

Run Performance Test

Setup Monitoring

Use KServe SDK

Presentations and Demoes

Roadmap

API Reference

Debugging Guide ⭐

Developer Guide

Performance Tests

Contributor Guide

Adopters

About

Uh oh!

Releases

Packages

Languages

License

zhqinqin/kserve

Folders and files

Latest commit

History

Repository files navigation

KServe

Architecture Review

Core Features and Examples

Learn More

Prerequisites

Installation

Standalone Installation

Quick Install (On your local machine)

Setup Ingress Gateway

Determine the ingress IP and ports

Test Installation

Verify installation

Create test inference service

Run Performance Test

Setup Monitoring

Use KServe SDK

Presentations and Demoes

Roadmap

API Reference

Debugging Guide ⭐

Developer Guide

Performance Tests

Contributor Guide

Adopters

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages