Working with the HPO direct acyclic graph (DAG)

First we include boiler plate code to define the project, set environment variables and import libraries.

The HPO table is stored in the reference data. It includes the code, search terms, associated genes and one or more parent HPO ids (codes).

We will now create two relations from the hpo_ensgenes.tsv source that we will use in our examples. First is a relation for hpo description and then another for the DAG as a (parent,child) relation and also a (hpo,gene_symbol) relation for mapping between hpo codes and gene_symbols.

Let's start by viewing the nature of the DAG and annotate it with descriptions:

We can now easily filter to find all the decendats of HP:0030126 by using the special ÌNDAG functional operator that behave simlarly as column in ( .. )

Another very useful command in DAGMAP that works with parent-child DAGS in a similar way as the MULTIMAP command works with regular relations.

We easily see that all the codes that passed through the filter _"where hpocode indag([#parentchild#],'HP:0030126')" are decendats of HP:0030126.

Similarly, we can see which genes map directly or indirectly to the HPO term HP:0030126.

Finally, we show how one can search for genes based on filtering of HPO terms. The filter is a generic search filter on the HPO description. HPO terms that pass the filter are then used to find all the decendant HPO terms and all the genes associated with them. Notice that we use the SPLIT command to separate each gene into a separate row. Then we count and annotate with GRANNO how many genes are associated with each HPO term and order the such that the most specific terms show up firs with each gene.

In order to display the wide columns, we use small Python code to print the results.