8000 Protobuf default recursion depth limit of 32 - fails disk buffering, codec, and Vector source/sink · Issue #19315 · vectordotdev/vector · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Protobuf default recursion depth limit of 32 - fails disk buffering, codec, and Vector source/sink #19315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sbalmos opened this issue Dec 5, 2023 · 1 comment
Labels
domain: codecs Anything related to Vector's codecs (encoding/decoding) type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@sbalmos
Copy link
Contributor
sbalmos commented Dec 5, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Vector's protobuf decoder has an implied by-default recursion depth limit of 32. This will cause events that are nested deeper than this limit to fail delivery at any point in a pipeline that uses protobuf encoding - whether protobuf/native codec, using the Vector native source/sink, or using a sink with disk buffering enabled. Depending on the configured retry behavior, an event of this type may cause infinite retries, effectively stopping the pipeline as a poison event.

This is directly related to the default recursion limit of 100-deep in the Prost protobuf crate, and is resolved by enabling the no-recursion-limit prost crate feature.

The error message differs between the usage, whether from a vector source/sink, or from disk buffering.

Vector sink:
2023-12-05T19:13:00.920742Z WARN sink{component_kind="sink" component_id=vector_out component_type=vector}:request{request_id=2}: vector::sinks::util::retries: Retrying after error. error=Request failed: status: Internal, message: "failed to decode Protobuf message: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: Log.fields: EventWrapper.event: PushEventsRequest.events: recursion limit reached", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Tue, 05 Dec 2023 19:13:00 GMT", "content-length": "0"} } internal_log_rate_limit=true

(note the smoking gun error message of "recursion limit reached")

Any disk-buffered sink:
2023-12-05T19:17:32.250361Z ERROR sink{component_kind="sink" component_id=sender component_type=http}: vector_buffers::internal_events: Error encountered during buffer read. error=failed to decoded record: InvalidProtobufPayload error_code="decode_failed" error_type="reader_failed" stage="processing" internal_log_rate_limit=true

Configuration

Vector source/sink example:
--
test_recursion_sender.yaml:
data_dir: /tmp
api:
  enabled: true
  address: 0.0.0.0:8686
  playground: false
sources:
  stdin:
    type: stdin
transforms:
  parse_json:
    type: remap
    inputs:
      - stdin
    source: |-
      .message = parse_json!(string!(.message), 128)
sinks:
  sender:
    type: vector
    inputs:
      - parse_json
    address: 127.0.0.1:6000

test_recursion_receiver.yaml:
data_dir: /tmp
api:
  enabled: true
  address: 0.0.0.0:8687
  playground: false
sources:
  receiver:
    type: vector
    address: 0.0.0.0:6000
sinks:
  console_out:
    type: console
    target: stdout
    inputs:
      - receiver
    encoding:
      codec: json

Disk buffering example:
--
test_recursion_sender.yaml:
data_dir: /tmp
api:
  enabled: true
  address: 0.0.0.0:8686
  playground: false
sources:
  stdin:
    type: stdin
transforms:
  parse_json:
    type: remap
    inputs:
      - stdin
    source: |-
      .message = parse_json!(string!(.message), 128)
sinks:
  sender:
    type: http
    inputs:
      - parse_json
    uri: http://127.0.0.1:80
    encoding:
      codec: json
    buffer:
      type: disk
      max_size: 268435488

test_recursion_receiver.yaml:
data_dir: /tmp
api:
  enabled: true
  address: 0.0.0.0:8687
  playground: false
sources:
  receiver:
    type: http_server
    address: 0.0.0.0:80
    decoding:
      codec: json
sinks:
  console_out:
    type: console
    target: stdout
    inputs:
      - receiver
    encoding:
      codec: json

Version

vector 0.35.0-custom-ca1b6d14a (x86_64-unknown-linux-gnu debug=full)

Debug Output

No response

Example Data

Any trivial 32+-level-deep JSON structure is usable, such as below.

{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":false}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

Additional Context

No response

References

#18130

@sbalmos sbalmos added the type: bug A code related bug. label Dec 5, 2023
@pront pront added type: enhancement A value-adding code change that enhances its existing functionality. domain: codecs Anything related to Vector's codecs (encoding/decoding) and removed type: bug A code related bug. labels Dec 8, 2023
@smo921
Copy link
smo921 commented Apr 17, 2024

We have been able to reproduce this locally with a trivial configuration on v0.35.1

sources/files.yaml:

type: file
include:
  - /log_files/*
read_from: beginning

transforms/debug.yaml:

type: remap
inputs:
  - files
source: |
  log(., level: "debug", rate_limit_secs: 500)
  
  m, e = parse_json(.message)
  if e == null {
    .message = m
  }
  log(., level: "debug", rate_limit_secs: 500)

sinks/blackhole.yaml

cat vector/sinks/blackhole.yaml 
type: blackhole
inputs:
  - debug
print_interval_secs: 10
buffer:
  max_size: 268435488
  type: disk
  when_full: block

Our test data is the same JSON object as above. The relevant output is captured below.

vector_playground-1  | 2024-04-17T18:39:29.454494Z DEBUG transform{component_kind="transform" component_id=debug component_type=remap}: vrl::stdlib::log: {"file":"/log_files/bad_file.json","host":"60efe62b5dca","message":"{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":{\"hi\":false}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}","source_type":"file","timestamp":"2024-04-17T18:39:29.454399Z"} internal_log_rate_secs=500 vrl_position=0

vector_playground-1  | 2024-04-17T18:39:29.454534Z DEBUG transform{component_kind="transform" component_id=debug component_type=remap}: vrl::stdlib::log: {"file":"/log_files/bad_file.json","host":"60efe62b5dca","message":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":false}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}},"source_type":"file","timestamp":"2024-04-17T18:39:29.454399Z"} internal_log_rate_secs=500 vrl_position=106

vector_playground-1  | 2024-04-17T18:39:29.454955Z ERROR sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector_buffers::internal_events: Error encountered during buffer read. error=failed to decoded record: InvalidProtobufPayload error_code="decode_failed" error_type="reader_failed" stage="processing" internal_log_rate_limit=true

As a work around we have set max_depth: 30 on the parse_json() call. This is similar to the issue noted in #20212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: codecs Anything related to Vector's codecs (encoding/decoding) type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

3 participants
0