8000 Home · rabix/rabix Wiki · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Apr 21, 2019. It is now read-only.
ntijanic edited this page Dec 23, 2014 · 7 revisions

Rabix is an open-source toolkit for developing tools and workflows for the common-workflow-language specification.

CWL is an informal task f 90BD orce consisting of people from various organizations that have an interest in portability of bioinformatics workflows. The goal is to specify a way to describe bioinformatics tools and workflows that is powerful, easy to use and allows for portability of tools/workflows and reproducibility of runs. Join the mailing list if interested in contributing!

Rabix is based on the draft1 specification of CWL, but also includes some own extensions.

There are four components of the toolkit:

  • App registry, for storing (revisions of) tools and workflows.
  • Tool editor, for creating and editing tool descriptions.
  • Workflow editor, for creating and editing workflows.
  • Python executor, for running tools and workflow locally.

Creating tools

You can play around with the tool editor anonymously here. At any point, you can export the created JSON file.

For easier development, it's best to log in with your Github account and make a repository (repositories created on rabix are independent from Github repositories). Once inside your repository, click on "tools" and then on the "new tool" button.

While editing a tool that belongs to a rabix repository, the "update" button will create a new revision. You can use the URL on the tool page to run the latest revision (using the Python executor) or the URLs from revision pages to run a specific snapshot.

Creating a docker image

CWL currently only allows docker images for distribution of binaries. To create a docker image for your tool, check out docker documentation. If you are describing someone else's tool, check the docker hub first - someone may have already created an image.

For local development, it is enough to specify the image ID. Once you push the image to the docker registry, enter the imageRepo with optional tag (e.g. jsmith/mytoolkit#2.0).

Specifying required resources

You can enter the number of CPU cores and memory (in MB) required to run the tool. A CPU value of zero means that the tool can run multithreaded in any number of cores allocated for the run. If the required memory varies based on inputs or settings, you can use an expression in place of a constant. More on expressions later.

Describing inputs

In the "inputs" tab, click on the "+" button to add a new input. Tool inputs are files or parameters that get passed through the process arguments or stdin.

The input name must be unique for the tool. Type can be a primitive (string, number, boolean), an object (a structure with own keys and values), or an array of such. For strings, you can enumerate allowed values by ticking the "enum" box.

In addition to these fields, you can configure some adapter fields that specify how this input will be propagated to command line.

  • Order - position of the argument on the command line. Lower means first.
  • Prefix - For keyword arguments (e.g. --input or -p). Blank for positional arguments. For boolean values, only prefix is added as argument if value is true. If false or null, nothing is added to process arguments.
  • Separator - Only applicable if prefix is specified. "Space" means pass as separate arguments.
  • Item separator - Only applicable for arrays. If blank, each item is repeated. If specified, string representations of items are concatenated with itemSeparator as a single process argument.
  • Value - Used if some manipulation is needed before passing the value to process. Click the </> button to enter a javascript expression that modifies the original value.

Some tools may have inputs that are not passed to the underlying process. These are usually parameters used to tweak embedded expressions. To disable the adapter for a specific input, click the broken link icon next to its name.

The "Test" tab will be populated with a generated form according to described inputs. Enter some test values to preview the command line in the "console" panel.

Describing outputs

Similar to tool inputs, you can describe the outputs on the "outputs" tab. There are a few major differences:

  • Types are for the moment restricted to files and file arrays.
  • Adapters do not specify how to map values to process arguments. Rather, they specify how to create the output structure from files produced by the tool.
  • Specify a glob pattern to match files for that output. Can be a constant or an expression.
  • You can also specify key-value pairs for metadata of generate file(s). Values can be expressions with $self variable bound to path matching the glob pattern.
  • Metadata can also be inherited from some input (before being overridden by above key-value pairs).
  • You can also specify a list of suffixes to "attach" any index files that may have been created.

Unlike command line generation, there is currently no way to test output descriptions from the browser. Easiest way to test is to run the JSON file with some inputs using the "rabix" command. A URL for the latest revision of tool JSON can be found on the tool page.

Additional adapter configuration

The "adapter" tab allows you to set the base command (array of process arguments, first of which is the executable path), stdin and stdout redirection, as well as specify input-independent adapters (same as adapters attached to inputs described above, except the "value" field must be specified).

Testing command line generation

As mentioned above, the "test" tab will hold a form generated according to input definitions. At any point, you can view the generated command line in the "console" panel. The panel is updated on any change in either tab.

The "allocated resources" section allows you to specify amount of allocated CPU and RAM, also for the purpose of testing command line generation (since some arguments may be bound to these, e.g. --num-threads).

Expressions

In many places you are able to specify either a constant or an "expression". Clicking the </> button will pop up a dialog where you can write javascript code in two ways:

  • A one-liner JS expression. Example: $job.allocatedResources.cpu
  • A function block. Example: { var x=1; return x*2; }

If an expression starts with the "{" character, it must be a function block.

All expressions have the $job variable bound to the JobOrder structure (see specification for details). You can see the JobOrder JSON object by clicking on the Job button.

If an expression is used in the context of "value" field of an input adapter, a $self variable will be bound to the value for that input. If used in context of output metadata values, the $self variable is bound to the path matched by glob pattern.

Creating workflows

You can combine tools and workflows into a workflow DAG using the workflow editor. Same as for tools, you can play around anonymously and import/export the JSON, or log in and store workflow revisions in a rabix repository.

Common-workflow-language draft1 does not specify workflow descriptions, so we are using a custom specification. Draft2 is planned to specify workflows and introduce some backward-incompatible changes to tool descriptions as well.

Examples

Take a look at public apps for examples of tools and workflows.

Running tools and workflows

The Rabix toolkit includes a Python executor implementation. See installation and use instructions here.

0