8000 GitHub - aurorazl/kfserving: Serverless Inferencing on Kubernetes
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

aurorazl/kfserving

 
 

Repository files navigation

KFServing

go.dev reference Coverage Status Go Report Card Releases LICENSE Slack Status

KFServing provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.

It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KFServing is being used across various organizations.

KFServing

Learn More

To learn more about KFServing, how to deploy it as part of Kubeflow, how to use various supported features, and how to participate in the KFServing community, please follow the KFServing docs on the Kubeflow Website. Additionally, we have compiled a list of KFServing presentations and demoes to dive through various details.

Prerequisites

Knative Serving and Istio should be available on Kubernetes Cluster, Knative depends on Istio Ingress Gateway to route requests to Knative services. To use the exact versions tested by the Kubeflow and KFServing teams, please refer to the prerequisites on developer guide

  • Istio: v1.1.6+
  • If you want to get up running Knative quickly or you do not need service mesh, we recommend installing Istio without service mesh(sidecar injection).

    Currently only Knative Serving is required, cluster-local-gateway is required to serve cluster-internal traffic for transformer and explainer use cases. Please follow instructions here to install cluster local gateway

    Cert manager is needed to provision KFServing webhook certs for production grade installation, alternatively you can run our self signed certs generation script.

    Install KFServing

    Standalone KFServing Installation

    KFServing can be installed standalone if your kubernetes cluster meets the above prerequisites and KFServing controller is deployed in kfserving-system namespace.

    For Kubernetes 1.16+ users

    TAG=v0.4.0
    kubectl apply -f ./install/$TAG/kfserving.yaml
    

    For Kubernetes 1.14/1.15 users

    TAG=v0.4.0
    kubectl apply -f ./install/$TAG/kfserving.yaml --validate=false
    

    KFServing uses pod mutator or mutating admission webhooks to inject the storage initializer component of KFServing. By default all the pods in namespaces which are not labelled with control-plane label go through the pod mutator. This can cause problems and interfere with Kubernetes control panel when KFServing pod mutator webhook is not in ready state yet.

    For Kubernetes 1.14 users we suggest enabling the following environment variable ENABLE_WEBHOOK_NAMESPACE_SELECTOR so that only pods in the namespaces which are labelled serving.kubeflow.org/inferenceservice: enabled go through the KFServing pod mutator.

    env:
    - name: ENABLE_WEBHOOK_NAMESPACE_SELECTOR
      value: enabled
    

    As of KFServing 0.4 release object selector is turned on by default, the KFServing pod mutator is only invoked for KFServing InferenceService pods. For prior releases you can turn on manually by running following command.

    kubectl patch mutatingwebhookconfiguration inferenceservice.serving.kubeflow.org --patch '{"webhooks":[{"name": "inferenceservice.kfserving-webhook-server.pod-mutator","objectSelector":{"matchExpressions":[{"key":"serving.kubeflow.org/inferenceservice", "operator": "Exists"}]}}]}'

    KFServing in Kubeflow Installation

    KFServing is installed by default as part of Kubeflow installation using Kubeflow manifests and KFServing controller is deployed in kubeflow namespace. Since Kubeflow Kubernetes minimal requirement is 1.14 which does not support object selector, ENABLE_WEBHOOK_NAMESPACE_SELECTOR is enabled in Kubeflow installation by default. If you are using Kubeflow dashboard or profile controller to create user namespaces, labels are automatically added to enable KFServing to deploy models. If you are creating namespaces manually using Kubernetes apis directly, you will need to add label serving.kubeflow.org/inferenceservice: enabled to allow deploying KFServing InferenceService in the given namespaces, and do ensure you do not deploy InferenceService in kubeflow namespace which is labelled as control-plane.

    Install KFServing in 5 Minutes (On your local machine)

    Make sure you have kubectl installed.

    1. If you do not have an existing kubernetes cluster, you can create a quick kubernetes local cluster with kind.

    Note that the minimal requirement for running KFServing is 4 cpus and 8Gi memory, so you need to change the docker resource setting to use 4 cpus and 8Gi memory.

    kind create cluster

    alternatively you can use Minikube

    minikube start --cpus 4 --memory 8192
    1. Install Istio lean version, Knative Serving, KFServing all in one.(this takes 30s)
    ./hack/quick_install.sh

    Test KFServing Installation

    Check KFServing controller installation

    kubectl get po -n kfserving-system
    NAME                             READY   STATUS    RESTARTS   AGE
    kfserving-controller-manager-0   2/2     Running   2          13m

    Please refer to our troubleshooting section for recommendations and tips for issues with installation.

    Create KFServing test inference service

    kubectl create namespace kfserving-test
    kubectl apply -f docs/samples/sklearn/sklearn.yaml -n kfserving-test

    Check KFServing InferenceService status.

    kubectl get inferenceservices sklearn-iris -n kfserving-test
    NAME           URL                                                              READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
    sklearn-iris   http://sklearn-iris.kfserving-test.example.com/v1/models/sklearn-iris   True    100                                109s

    Determine the ingress IP and ports

    Execute the following command to determine if your kubernetes cluster is running in an environment that supports external load balancers

    $ kubectl get svc istio-ingressgateway -n istio-system
    NAME                   TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)   AGE
    istio-ingressgateway   LoadBalancer   172.21.109.129   130.211.10.121   ...       17h

    If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway.

    export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')

    If the EXTERNAL-IP value is none (or perpetually pending), your environment does not provide an external load balancer for the ingress gateway. In this case, you can access the gateway using the service’s node port.

    # GKE
    export INGRESS_HOST=worker-node-address
    # Minikube
    export INGRESS_HOST=$(minikube ip)
    # Other environment(On Prem)
    export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}')
    
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')

    Alternatively you can do Port Forward for testing purpose

    INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
    kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80
    # start another terminal
    export INGRESS_HOST=localhost
    export INGRESS_PORT=8080

    Curl the InferenceService

    Curl from ingress gateway

    SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kfserving-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./docs/samples/sklearn/iris-input.json

    Curl from local cluster gateway

    curl -v http://sklearn-iris.kfserving-test/v1/models/sklearn-iris:predict -d @./docs/samples/sklearn/iris-input.json

    Run Performance Test

    kubectl create -f docs/samples/sklearn/perf.yaml -n kfserving-test
    # wait the job to be done and check the log
    kubectl logs load-test8b58n-rgfxr -n kfserving-test
    Requests      [total, rate, throughput]         30000, 500.02, 499.99
    Duration      [total, attack, wait]             1m0s, 59.998s, 3.336ms
    Latencies     [min, mean, 50, 90, 95, 99, max]  1.743ms, 2.748ms, 2.494ms, 3.363ms, 4.091ms, 7.749ms, 46.354ms
    Bytes In      [total, mean]                     690000, 23.00
    Bytes Out     [total, mean]                     2460000, 82.00
    Success       [ratio]                           100.00%
    Status Codes  [code:count]                      200:30000
    Error Set:

    Setup Ingress Gateway

    If the default ingress gateway setup does not fit your need, you can choose to setup a custom ingress gateway

    Setup Monitoring

    Use KFServing SDK

    • Install the SDK

      pip install kfserving
      
    • Get the KFServing SDK documents from here.

    • Follow the example here to use the KFServing SDK to create, rollout, promote, and delete an InferenceService instance.

    KFServing Features and Examples

    KFServing Features and Examples

    KFServing Presentations and Demoes

    KFServing Presentations and Demoes

    KFServing Roadmap

    KFServing Roadmap

    KFServing Concepts and Data Plane

    KFServing Concepts and Data Plane

    KFServing API Reference

    KFServing API Docs

    KFServing Debugging Guide ⭐

    Debug KFServing InferenceService

    Developer Guide

    Developer Guide.

    Performance Tests

    KFServing benchmark test comparing Knative and Kubernetes Deployment with HPA

    Performance Tests

    Contributor Guide

    Contributor Guide

    KFServing Adopters

    KFServing Adopters

About

Serverless Inferencing on Kubernetes

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jsonnet 66.1%
  • Go 19.9%
  • Python 12.2%
  • Shell 1.3%
  • Other 0.5%
0