10000 Add a collapse command by beckyjackson · Pull Request #578 · ontodev/robot · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add a collapse command #578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 3, 2020
Merged

Add a collapse command #578

merged 17 commits into from
Mar 3, 2020

Conversation

beckyjackson
Copy link
Contributor
@beckyjackson beckyjackson commented Oct 11, 2019

Collapse

Sometimes, a class hierarchy can contain more intermediate classes than necessary, especially when extracting modules. ROBOT includes collapse to remove intermediate classes based on a minimal number of subclasses, using the --threshold option.

robot collapse \
 --input module.owl \
 --threshold 3 \
 --output minimized_module.owl

Any intermediate class (has one or more subclasses) that has less than the threshold number of subclasses will be removed. Top-level classes (do not have a named superclass) and bottom-level classes (do not have any subclasses) will not be removed.

For example, given --threshold 2:

- class:A
    - class:B
    - class:C
        - class:D
           - class:E
    - class:F
        - class:G
        - class:H

Becomes:

- class:A
    - class:B
    - class:E
    - class:F
            - class:G
            - class:H

class:C and class:D are removed because they each only have one subclass. class:F is kept because it has two subclasses, which is the threshold.

If there are any classes that you don't want removed, you can keep them regardless of the number of subclasses using --precious <IRI/CURIE> (for a set of terms in a file, use --precious-terms <term-file>).

robot collapse \
 --input uberon_module.owl \
 --threshold 3 \
 --precious UBERON:0000483 \
 --output results/uberon_minimized.owl

For example, given --threshold 2 and --precious class:D, that same example from above would become:

- class:A
    - class:B
    - class:D
        - class:E
    - class:F
            - class:G
            - class:H

@jamesaoverton
Copy link
Member

@ontodev/robot-team Anyone have any thoughts on this implementation? Do you have code that does something similar?

@jamesaoverton jamesaoverton changed the title Add a minimize command WIP: Add a minimize command Nov 20, 2019
@jamesaoverton
Copy link
Member

Before merging this, I'd like @beckyjackson to do a comparison with OntoFox computed intermediates for some reasonably fancy ontology.

@jamesaoverton jamesaoverton added this to the v1.6.0 milestone Nov 28, 2019
@beckyjackson
Copy link
Contributor Author

@jamesaoverton I think the extract --intermediates minimal is more like OnfoFox's computed intermediates.

Lower terms:

http://purl.obolibrary.org/obo/OBI_0000210
http://purl.obolibrary.org/obo/OBI_0002618
http://purl.obolibrary.org/obo/OBI_0002050
http://purl.obolibrary.org/obo/OBI_0000654
http://purl.obolibrary.org/obo/OBI_0001670

ROBOT method:

robot extract --input test.owl \
  --method mireot \
  --lower-terms terms.txt \
  --intermediates minimal \
  --output robot.owl

OntoFox file:

[URI of the OWL(RDF/XML) output file]


[Source ontology]
OBI

[Low level source term URIs]
http://purl.obolibrary.org/obo/OBI_0000210
http://purl.obolibrary.org/obo/OBI_0002618
http://purl.obolibrary.org/obo/OBI_0002050
http://purl.obolibrary.org/obo/OBI_0000654
http://purl.obolibrary.org/obo/OBI_0001670

[Top level source term URIs and target direct superclass URIs]
http://purl.obolibrary.org/obo/BFO_0000001

[Source term retrieval setting]
includeComputedIntermediates

[Source annotation URIs]
includeAllAnnotationProperties

[Source annotation URIs to be excluded]

These two methods produce the exact same hierarchy:
image

Using minimize, on the other hand, usually produces something different since you can specify the threshold. In this example, if you use --threshold 2, you actually get the same result since it keeps anything that has two or more subclasses:

robot extract --input test.owl \
  --method mireot \
  --lower-terms terms.txt \
minimize --threshold 2 \
  --output minimize.owl

Once you set the threshold up to 3, it removes pretty much all intermediates because nothing has 3 superclasses.

@jamesaoverton
Copy link
Member

@ontodev/robot-team We're planning to merge this soon. Any feedback would be appreciated!

@cmungall
Copy link
Contributor
cmungall commented Jan 6, 2020

I don't have strong opinions. I don't really understand the motivation. I am always suspicious about counting levels as they are not often meaningful. Can you give an example of where this is to be used?

I worry a bit that it uses a quite general term minimize for what seems like a very niche(?) use case. What about prune or truncate or trim or chop or ...

@beckyjackson
Copy link
Contributor Author

I'm OK with changing the name, but I just want to make sure it correctly conveys that we are removing intermediate nodes, not any bottom or top-level things.

Our original use case for this was to minimize the ChEBI tree for IEDB. We have a set of "precious" terms that are the terms that we care about for the IEDB, but there are too many intermediate nodes to get there.

It does indiscriminately remove terms, although if there are important intermediate terms (e.g., carbohydrate in ChEBI) you can add them to the "precious" terms and they won't be removed.

@cmungall
Copy link
Contributor
cmungall commented Jan 6, 2020

Ah I get it now, for some reason I was reading threshold as being depth, but it's actually the direct subclass count (as you stated clearly I just didn't read closely).

OK, so I can see how this is useful for certain kinds of ontology structures where there are lots of single children. I expect ncbitaxon too. But IMO this is really just poor ontology design to have this orphan pattern (ncbitaxon ok as it's really a species taxonomy, and doesn't generally include extinct taxa). So I don't see it being used outside a handful of odd ontologies? Not to say that doesn't mean we shouldn't have it, just observing.

I don't really see a use case for values other than 2?

@jamesaoverton
Copy link
Member

We'll look for a more specific name. When building simplified trees to show to users, we are using thresholds higher than 2.

@beckyjackson will check whether the code for the existing extract --intermediates minimal can be replaced as a special case of this more general code.

@beckyjackson
Copy link
Contributor Author

What about collapse?

@wdduncan
Copy link
wdduncan commented Jan 9, 2020 via email

@matentzn
Copy link
Contributor
matentzn commented Jan 9, 2020

+1 to collapse!

@beckyjackson beckyjackson changed the title WIP: Add a minimize command WIP: Add a collapse command Jan 21, 2020
@beckyjackson
Copy link
Contributor Author

I changed the method to collapse and also rewrote the extract --intermediates minimal feature to reuse this code.

@jamesaoverton jamesaoverton changed the title WIP: Add a collapse command Add a collapse command Feb 11, 2020
@jamesaoverton
Copy link
Member

We renamed this command and plan to merge it in the next few days. Last chance to comment!

@jamesaoverton
Copy link
Member

What about a default --threshold value of 2?

@beckyjackson
Copy link
Contributor Author

That makes sense to me. I just added it.

@jamesaoverton jamesaoverton mentioned this pull request Feb 28, 2020
@jamesaoverton jamesaoverton merged commit 546d317 into master Mar 3, 2020
@jamesaoverton jamesaoverton deleted the minimize branch June 16, 2022 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0