8000 Triton Inference Server: Transformation Plugin or Inference Extension - nvidia llm-router demo · Issue #616 · envoyproxy/ai-gateway · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Triton Inference Server: Transformation Plugin or Inference Extension - nvidia llm-router demo #616
Open
@smarunich

Description

@smarunich

Based on https://github.com/NVIDIA-AI-Blueprints/llm-router where the flow is following: https://github.com/smarunich/llm-router/blob/main/llm-router-request-flow.md

How to achieve the Step 2? (https://github.com/smarunich/llm-router/blob/main/llm-router-request-flow.md#2-router-controller-to-router-server-triton)

i.e. should the transformation plugin be used with AI gateway or what is the framework to create a custom transformation plugin for below?

  1. Client sends OpenAI-compatible request to the gateway
  2. Gateway needs to transform this request into a format accepted by Triton Inference Server
  3. Specifically, we need to extract the last user message from the request and format it as:
{
  "inputs": [
    {
      "name": "INPUT",
      "datatype": "BYTES",
      "shape": [1, 1],
      "data": [["User message content"]]
    }
  ]
}

or should be the EPP: https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/README.md or https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool what used with Envoy AI Gateway to accomplish so?

Trying to understand the functionality of Envoy AI Gateway and how it can be consumed if it is already exist more concrete examples will help?

The https://github.com/NVIDIA-AI-Blueprints/llm-router project uses a single triton-server so test example probably will be more simple than a production version

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0