Dynamically connect/detect persistent volume paths and claims #5

dleehr · 2019-01-09T16:52:59Z

Currently, the job builder assumes 4 specific PVCs and their mount points

Lines 39 to 44 in b4526af

    
           def populate_demo_values(self): 
        
               # TODO: fetch these from the kubernetes API since they are attached to this pod 
        
               self.add_persistent_volume_entry('/calrissian/input-data', 'calrissian-input-data') 
        
               self.add_persistent_volume_entry('/calrissian/output-data', 'calrissian-output-data') 
        
               self.add_persistent_volume_entry('/calrissian/tmptmp', 'calrissian-tmp') 
        
               self.add_persistent_volume_entry('/calrissian/tmpout', 'calrissian-tmpout')

These could either be dynamically looked up through the k8s API, or provided in a ConfigMap

Not terribly urgent if we continue the convention, but it is a kludge.

dleehr · 2019-01-11T14:59:18Z

stepping back a little, this design was essentially ported over from the local docker implementation. Some quick notes to consider on each of these entries:

input-data: should be fine as-is. Obviously some CommandLineJob containers will need to access input data when running workflow steps
output-data: should be fine as-is. But I don't think the CommandLineJobs actually write directly here, I believe that's done at a higher level (would happen inside the calrissian process, not in a job it schedules)
tmptmp: I don't know that /tmp needs to be a volume mount at all, and it probably doesn't need to be a persistent read-write-many volume mount. In fact, that may be the opposite of what we want. The current convention was adapted from cwltool's local Docker volume mounting without much scrutiny. There are reasons for mounting /tmp as a volume inside a local docker container, but those might not be valid for mounting /tmp across a network to a storage cluster. Certainly needs some attention. Also, I named it tmptmp because /tmp as a string prefix also matches /tmpout and the lookup code is simple.
tmpout: Probably fine initially. containers will mount paths under here to pass intermediate step data. The current implementation uses one big shared volume, and mounts distinct subpaths as needed by containers. This could see a lot of I/O depending on the workload, and I'm not sure how the storage will behave. We could look at creating PVCs for each container's tmpout, and then would have to keep track of those.

dleehr · 2019-01-18T19:21:12Z

Suggestion on how to handle /tmp: Use emptyDir.

An emptyDir volume is first created when a Pod is assigned to a Node, and exists as long as that Pod is running on that node. As the name says, it is initially empty...When a Pod is removed from a node for any reason, the data in the emptyDir is deleted forever

This is probably the closest in intent to what CWL does with local docker.

johnbradley · 2019-01-22T15:53:14Z

Moved remaining issues to #30 and #31.

dleehr mentioned this issue Jan 9, 2019

Support for PersistentVolumeClaim volumes #1

Closed

8000

dleehr mentioned this issue Jan 11, 2019

Implement unit tests and configure for CircleCI #13

Merged

johnbradley self-assigned this Jan 15, 2019

johnbradley mentioned this issue Jan 22, 2019

Reconsider mounting /tmp as a shared PersistentVolumeClaim #30

Closed

johnbradley closed this as completed Jan 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamically connect/detect persistent volume paths and claims #5

Dynamically connect/detect persistent volume paths and claims #5

Uh oh!

Uh oh!

Uh oh!

Dynamically connect/detect persistent volume paths and claims #5

Dynamically connect/detect persistent volume paths and claims #5

Comments

Uh oh!

Uh oh!

Uh oh!