8000 Dynamically connect/detect persistent volume paths and claims · Issue #5 · Duke-GCB/calrissian · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Dynamically connect/detect persistent volume paths and claims #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dleehr opened this issue Jan 9, 2019 · 3 comments
Closed

Dynamically connect/detect persistent volume paths and claims #5

dleehr opened this issue Jan 9, 2019 · 3 comments
Assignees

Comments

@dleehr
Copy link
Member
dleehr commented Jan 9, 2019

Currently, the job builder assumes 4 specific PVCs and their mount points

def populate_demo_values(self):
# TODO: fetch these from the kubernetes API since they are attached to this pod
self.add_persistent_volume_entry('/calrissian/input-data', 'calrissian-input-data')
self.add_persistent_volume_entry('/calrissian/output-data', 'calrissian-output-data')
self.add_persistent_volume_entry('/calrissian/tmptmp', 'calrissian-tmp')
self.add_persistent_volume_entry('/calrissian/tmpout', 'calrissian-tmpout')

These could either be dynamically looked up through the k8s API, or provided in a ConfigMap

Not terribly urgent if we continue the convention, but it is a kludge.

@dleehr
Copy link
Member Author
dleehr commented Jan 11, 2019

stepping back a little, this design was essentially ported over from the local docker implementation. Some quick notes to consider on each of these entries:

  • input-data: should be fine as-is. Obviously some CommandLineJob containers will need to access input data when running workflow steps
  • output-data: should be fine as-is. But I don't think the CommandLineJobs actually write directly here, I believe that's done at a higher level (would happen inside the calrissian process, not in a job it schedules)
  • tmptmp: I don't know that /tmp needs to be a volume mount at all, and it probably doesn't need to be a persistent read-write-many volume mount. In fact, that may be the opposite of what we want. The current convention was adapted from cwltool's local Docker volume mounting without much scrutiny. There are reasons for mounting /tmp as a volume inside a local docker container, but those might not be valid for mounting /tmp across a network to a storage cluster. Certainly needs some attention. Also, I named it tmptmp because /tmp as a string prefix also matches /tmpout and the lookup code is simple.
  • tmpout: Probably fine initially. containers will mount paths under here to pass intermediate step data. The current implementation uses one big shared volume, and mounts distinct subpaths as needed by containers. This could see a lot of I/O depending on the workload, and I'm not sure how the storage will behave. We could look at creating PVCs for each container's tmpout, and then would have to keep track of those.

@dleehr
Copy link
Member Author
dleehr commented Jan 18, 2019

Suggestion on how to handle /tmp: Use emptyDir.

An emptyDir volume is first created when a Pod is assigned to a Node, and exists as long as that Pod is running on that node. As the name says, it is initially empty...When a Pod is removed from a node for any reason, the data in the emptyDir is deleted forever

This is probably the closest in intent to what CWL does with local docker.

@johnbradley
Copy link
Contributor

Moved remaining issues to #30 and #31.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0