Content-length header being removed, force-switching the request to chunked transfer endcoding

@cmaddalozzo

Description:

What issue is being seen? Describe what should be happening instead of
the bug, for example: Envoy should not crash, the expected value isn't
returned, etc.

ProcessRequestHeaders() in UpstreamFilter extproc explicitly sets the return status as CONTINUE_AND_REPLACE here. This results in envoy dropping the content-length header causing the request to switch to chunked-transfer-encoding. Refer envoy-code

Anthropic models hosted on GCP do not support chunked-transfer-encoding and result in below error

{
  "error": {
    "code": 400,
    "message": "Prediction on deployed model (endpoint_id: <redacted-endpoint-id>, deployed_model_id: <redacted-model-id>) failed with error: \"Bad Request\".",
    "status": "INVALID_ARGUMENT"
  }
}

Expected Behavior:
content-length header should not be unilaterally removed by envoy + envoy-ai-gateway

Repro steps:

Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.

gcp-anthropic model throwing INVALID_ARGUMENT when chunked-transfer-encoding header is set
Note: transfer-encoding: chunked header is set

curl --request POST \
  --url https://us-east5-aiplatform.googleapis.com/v1/projects/<PROJECT-NAME>/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku@20241022:rawPredict \
  --header 'authorization: Bearer <TOKEN>' \
  --header 'content-type: application/json' \
  --header 'transfer-encoding: chunked' \
  --data '{"anthropic_version": "vertex-2023-10-16","messages": [{"role": "user","content": [{"type": "text","text": "What are you doing?"}]}],"max_tokens": 256,"stream": false}'

Output

{
  "error": {
    "code": 400,
    "message": "Prediction on deployed model (endpoint_id: <redacted-endpoint-id>, deployed_model_id: <redacted-model-id>) failed with error: \"Bad Request\".",
    "status": "INVALID_ARGUMENT"
  }
}

Valid response when content-length header is set
Note: transfer-encoding header is NOT set

curl --request POST \
  --url https://us-east5-aiplatform.googleapis.com/v1/projects/<PROJECT-NAME>/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku@20241022:rawPredict \
  --header 'authorization: Bearer <TOKEN>' \
  --header 'content-type: application/json' \
  --header 'user-agent: vscode-restclient' \
  --data '{"anthropic_version": "vertex-2023-10-16","messages": [{"role": "user","content": [{"type": "text","text": "What are you doing?"}]}],"max_tokens": 256,"stream": false}'

Output

{
  "id": "<redacted-response-id>",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-5-haiku-20241022",
  "content": [
    {
      "type": "text",
      "text": "I want to be direct with you. I aim to help you by listening and responding to whatever task or conversation you would like to have. How can I assist you today?"
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "output_tokens": 38
  }
}

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Environment:

Include the environment like gateway version, envoy version and so on.

Logs:

Include the access logs and the Envoy logs.

Thanks @cmaddalozzo for helping investigate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions