RocksDB massively exceeds memory limits. Potential memory leak

I have a rocksdb of 400GB size. It uses 40GB RSS and keeps growing, eating up all the Swap. Finally it eats up all memory. The rocksdb version is 5.8.6.

Expected behavior

According to https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB the RocksDB instance should only use about 8GB - I have done calculations with a worst-case scenario.

Index size: 4,61 GB
Bloom Filters: 3.9 GB
Memtable: 10*128MB = 1,28GB
Block Cache: 0.5 GB
Blocks pinned by iterators: 14MB
Sum: Approximately 10,3GB

I also added memory not mentioned on the Wiki, namely:
Memtables pinned by iterators: 1,28GB (estimation: presuming they pin some older memtables)
Compaction (I understand that they could double Index and Bloom): 4.61 + 3.9GB
Sum: Approximately 9,8GB

All together, taking the worst case scenario, this is 20 GB.

Actual behavior

RocksDB takes way more memory than expected and finally eats up all memory. The memory consumption is far above expectations: Even shortly after start it requires 33GB RSS.

Detail: Swap is filled before buffers/cache. As you see below the swap is full (7GB) but there is still lots of data cached (14GB), so I guess RocksDB is pinning data in memory. Memory loss happens during reading from the DB via prefix iterator (seek, next), because when just writing (on the average 100MB/s to SSD) we do not lose any memory.

dbdir # free -m
total used free shared buff/cache available
Mem: 64306 49196 293 12 14816 14440
Swap: 7710 7710 0

RocksDB is being used embedded in Java via rocksdbjni. The Java part requires less than 1GB as Heap. Memory usage of that process, as taken from top:
VIRT 0.450t
RES 0.048t
SHR 0.0.12t

I have run pmap on the process, and summed up RSS for the *.sst files: The sum for *.sst is 34GB.

Steps to reproduce the behavior

We can reproduce the behavior in our application quickly. It never happens if we just write. As soon as we allow clients to read, the memory loss happens. The only operations on this DB are put, prefix-iterator seek() + next(), and deleteRange.
One important observation: We use itertators for a short time, then close them to avoid resource locking by the iterator and create a new Iterator. If we chose a shorter time until discarding an Iterator, then we create more iterators and the memory loss is faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expected behavior

Actual behavior

Steps to reproduce the behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Expected behavior

Actual behavior

Steps to reproduce the behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions