8000 Bug: Tail latency barrier causes infrequent eviction, leading to memory usage oscillation · Issue #8723 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Bug: Tail latency barrier causes infrequent eviction, leading to memory usage oscillation #8723
Closed
@KeXiangWang

Description

@KeXiangWang

Describe the bug

Currently, we evict the operator cache when an epoch finish/receives a barrier. When an epoch lasts too long, there will be a long period that the operator cache does not evict.

Here are two typical scenarios:
If there are a lot of operator misses, then requests will go to the block cache. When compacting, the entries in the block cache will be invalidated, causing the in-flight epoch's events to suffer a severe long-tail latency. At that time, the operator cache will not evict for a long time. Once the epoch finish, due to our aggressive memory control mechanism, the operator cache will evict almost all the cache at one time, leading to a memory drop.

The second scenario is the constant OOM phenomenon after the first time CN crash. When a CN crashes, the k8s manager will restart the pod immediately. The newly arrived events may then read the old data. 1. reading the old data from remote storage takes long. 2. The old data will occupy a large amount of memory. Due to these two reasons, the first epoch will be long and memory-intensive but without any eviction. At this time, the CN is likely to OOM.

To Reproduce

No response

Expected behavior

No response

Additional context

Possible solution:

  1. Evict every chunk.
    For eviction overhead, if the global watermark is not changed, eviction overhead is negligible. If changed, that means we should do eviction, the overhead is necessary.
    One problem is that, in some extreme scenarios, chunk eviction is still not enough.

Please comment if you have any better ideas.

Metadata

Metadata

Assignees

Labels

type/bugType: Bug. Only for issues.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0