Description
Description:
What issue is being seen? Describe what should be happening instead of
the bug, for example: Envoy should not crash, the expected value isn't
returned, etc.
ProcessRequestHeaders() in UpstreamFilter extproc explicitly sets the return status as CONTINUE_AND_REPLACE
here. This results in envoy dropping the content-length header causing the request to switch to chunked-transfer-encoding. Refer envoy-code
Anthropic models hosted on GCP do not support chunked-transfer-encoding and result in below error
{
"error": {
"code": 400,
"message": "Prediction on deployed model (endpoint_id: <redacted-endpoint-id>, deployed_model_id: <redacted-model-id>) failed with error: \"Bad Request\".",
"status": "INVALID_ARGUMENT"
}
}
Expected Behavior:
content-length header should not be unilaterally removed by envoy + envoy-ai-gateway
Repro steps:
Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.
gcp-anthropic model throwing INVALID_ARGUMENT
when chunked-transfer-encoding header is set
Note: transfer-encoding: chunked
header is set
curl --request POST \
--url https://us-east5-aiplatform.googleapis.com/v1/projects/<PROJECT-NAME>/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku@20241022:rawPredict \
--header 'authorization: Bearer <TOKEN>' \
--header 'content-type: application/json' \
--header 'transfer-encoding: chunked' \
--data '{"anthropic_version": "vertex-2023-10-16","messages": [{"role": "user","content": [{"type": "text","text": "What are you doing?"}]}],"max_tokens": 256,"stream": false}'
Output
{
"error": {
"code": 400,
"message": "Prediction on deployed model (endpoint_id: <redacted-endpoint-id>, deployed_model_id: <redacted-model-id>) failed with error: \"Bad Request\".",
"status": "INVALID_ARGUMENT"
}
}
Valid response when content-length header is set
Note: transfer-encoding header is NOT set
curl --request POST \
--url https://us-east5-aiplatform.googleapis.com/v1/projects/<PROJECT-NAME>/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku@20241022:rawPredict \
--header 'authorization: Bearer <TOKEN>' \
--header 'content-type: application/json' \
--header 'user-agent: vscode-restclient' \
--data '{"anthropic_version": "vertex-2023-10-16","messages": [{"role": "user","content": [{"type": "text","text": "What are you doing?"}]}],"max_tokens": 256,"stream": false}'
Output
{
"id": "<redacted-response-id>",
"type": "message",
"role": "assistant",
"model": "claude-3-5-haiku-20241022",
"content": [
{
"type": "text",
"text": "I want to be direct with you. I aim to help you by listening and responding to whatever task or conversation you would like to have. How can I assist you today?"
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"output_tokens": 38
}
}
Note: If there are privacy concerns, sanitize the data prior to
sharing.
Environment:
Include the environment like gateway version, envoy version and so on.
Logs:
Include the access logs and the Envoy logs.
Thanks @cmaddalozzo for helping investigate