vine: reaccounting disk allocation of tasks in workers · Issue #4063 · cooperative-computing-lab/cctools · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A worker by default reports the disk usage of its cache and its tasks' disk allocations as its total disk usage to the manager. If tasks' inputs are already in the cache however, this results in the duplication of the cached input disk usage in both the vine cache and in the tasks' disk allocations.
For example, a worker W with 30GBs of disk allocation is assigned a task T1 with 20GBs of disk allocation with 19GBs of cacheable input files. To run T1, W fetches and caches 19GBs of T1's cacheable input files in its cache. This causes W to report back to the manager with its total disk usage = its vine cache + its task disk allocation = 19GBs + 20GBs = 39GBs, while the true disk usage value is 19GBs (from the cache) plus whatever files that are in T1's sandbox that are not cached. This issue causes the manager to not send tasks to W even though it can.
To fix this problem, when the manager is matching a task to a worker, it should adjust the task's disk allocation if some of its input files are already cached. Using the example above, T1's disk allocation should be adjusted by the manager from 20GBs to (20-19) = 1GB.
Again, this is just another example of how we are not carefully adhering to a consistent underlying model of storage management.
At the worker:
Input files go in the cache.
Input files are linked into sandboxes.
Sandboxes contain only intermediate and output files.
And so:
The sandbox allocation has nothing to do with the size of input files. It only contains intermediate and output files.
The worker's storage consumption is the size of the cache plus the sum of all sandboxes.
However:
When the manager wants to send a task to a worker, it should check that the available space is big enough for the sandbox PLUS the size of output files not already present. (And note that this is not the same as just making the sandbox bigger.)
The manager should not take the user's sandbox size of 20GB and reduce it to 19GB. The sandbox size should have been 1GB in the first place, and the manager should then further account for the size of the input files needed.
A worker by default reports the disk usage of its cache and its tasks' disk allocations as its total disk usage to the manager. If tasks' inputs are already in the cache however, this results in the duplication of the cached input disk usage in both the vine cache and in the tasks' disk allocations.
For example, a worker W with 30GBs of disk allocation is assigned a task T1 with 20GBs of disk allocation with 19GBs of cacheable input files. To run T1, W fetches and caches 19GBs of T1's cacheable input files in its cache. This causes W to report back to the manager with its total disk usage = its vine cache + its task disk allocation = 19GBs + 20GBs = 39GBs, while the true disk usage value is 19GBs (from the cache) plus whatever files that are in T1's sandbox that are not cached. This issue causes the manager to not send tasks to W even though it can.
To fix this problem, when the manager is matching a task to a worker, it should adjust the task's disk allocation if some of its input files are already cached. Using the example above, T1's disk allocation should be adjusted by the manager from 20GBs to (20-19) = 1GB.
Points of contact: @tphung3 @colinthomas-z80
The text was updated successfully, but these errors were encountered: