vine: possible hash table ineficiencies

I observed a significant slowdown toward the end of a run with 880K tasks in total. With approximately 90% of tasks completed, 1000 available cores, fewer than 100 tasks running concurrently, and thousands of tasks remaining in the ready queue, the system became sluggish.

The progress bar advanced very slowly, only one or two tasks completed every 10 seconds, compared to the initial stage where hundreds of tasks completed every second.

To investigate, I used perf to monitor which function the manager spent most of its time on. The commands were:

perf record -g -F 99 -p [PID] -- sleep 10
perf report

And here is the output:

It looks that the hash table iterations became extremely expensive when the table had served for a large number of elements.

My hypothesis was that the table capacity gets expanded every time the threshold is triggered. Elements come and go over time, but the table is never shrunk. As a result, after serving for numerous elements, the number of valid elements in the hash table is small, but the table itself becomes very large due to previous expansions, resulting in the final iterations very time-consuming.

I implemented a shrinking strategy to clean up redundant capacity when the number of elements is low. Combined with some other strategies, I saw an improved performance, but not very sure whether this had a positive impact.

I need to write a separate program to test whether the hash table becomes sluggish under large-scale scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions