8000 Remove foreign axioms to create base product by beckyjackson · Pull Request #570 · ontodev/robot · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Remove foreign axioms to create base product #570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 20, 2019
Merged

Conversation

beckyjackson
Copy link
Contributor

New options for remove/filter:

  • Global --base <namespace>: add one or more base namespaces for the input ontology, e.g. http://purl.obolibrary.org/obo/OBI_
  • --select foreign: select all entities that have a foreign namespace (other than base, and other than a few defaults: oboInOwl, owl, rdf, rdfs, xsd as well as IAO:0000115)
  • --exclude-term <term>/--exclude-terms <term-file>: term or term file to exclude from being removed or filtered for

Are there other IAO terms that should be hard-coded to be excluded?

@matentzn
Copy link
Contributor

Awesome @beckyjackson that’s really great. Please give me a two days to reviews this PR before merging it; I have battled with this issue quite a bit and I would like to get it right :)

@matentzn
Copy link
Contributor

Hey all, thanks for making a push in this direction. I don't want to make this a decade long discussion like in the case of --signature true (:P, sorry), but at least I would like to let you know what we would need to be able to use this functionality.

First of all, some terminology to agree on (hope this is undisputed):

  • the base (release) is the set of all base-axioms in the ontology
  • a base axiom is any axiom that contains at least one base term (INTERNAL:A sub: EXTERNAL:B, INTERNAL:A sub INTERNAL:B, INTERNAL:A sub EXTERNAL:R some EXTERNAL:B, EXTERNAL:B sub EXTERNAL R some INTERNAL:A
  • a base term is a term confirming to one the base namespaces of the ontology
  • a base namespace is an IRI pattern that matches terms belonging to an ontology (aka base terms)
  • an external term is any term that is not a base term
  • an external axiom is any axiom that is not a base axiom

There are three use cases here that I need to support:

  1. Spring clean use case:

    • Over the years, people have either deliberately merged external ontologies into their edit files, or external axioms have slipped into the edit file accidentally during protege edits. The most typical case here is EXTERNAL:ID_0001 owl:SubclassOf owl:Thing, but there are many.
    • From time to time, I would like to do a clean up of the edit file that removes all external axioms. This includes, in particular:
      • Declarations of entities belonging to external namespaces
      • Any external axioms according to above definition
    • This use case is important because 1) it enables a clean base release and 2) stale, outdated axioms do not cause any inconsistencies down the line.
  2. The base release and the simple release use case:

    • Same as above, only applied to the release file instead of the edit file. This would be default for ODK config.

So afaics, your notion of base namespace is in line with what I expect. However, I am not sure why we would want to make any exceptions at all when defining "external terms". You say an external term is everything that does not confirm to an internal namespace except for x, y, z. (as an aside, in your view of the term: What if A subclass: R some B is used? Would R not be selected as foreign?)

Rather than focusing ROBOT on a notion of external or internal terms, I would prefer a notion of external or internal axioms (--axioms internal), and I would define it like this:

filter --term-file $(BASESEED) --select "annotations ontology anonymous self" --trim true --signature true (example)`

How does this differ from your proposal?

@beckyjackson
Copy link
Contributor Author

So afaics, your notion of base namespace is in line with what I expect. However, I am not sure why we would want to make any exceptions at all when defining "external terms".

OBI, for example, has "adopted" some terms from other namespaces that are part of the base ontology but do not start with the OBI namespace. This allows for exceptions like this. Additionally, not all IAO annotation properties and RO object properties are excluded from being removed (although I would like to see if there are more we should add), so the base might lose important info if you don't have an exclusion case.

as an aside, in your view of the term: What if A subclass: R some B is used? Would R not be selected as foreign?

If R is not in the namespace and is not in the excluded terms, then it would be selected as foreign. As long as you use --trim false, the axiom will be maintained, though, assuming A is a base term.

I would prefer a notion of external or internal axioms (--axioms internal)

This is definitely possible. Would this be easier to use? My concern here is that this differs from how we have used --axioms in the past. This option always corresponds to an OWLAxiom class, with a few shortcuts to group OWLAxiom classes together.

@matentzn
Copy link
Contributor

OBI, for example, has "adopted" some terms from other namespaces that are part of the base ontology but do not start with the OBI namespace.

100% I have the same. My argument was only against hard-coding exceptions. The base parameters takes a set of IRI patterns instead of "namespaces", and then everyone can decide whether IAO:0000115 should be in it or not (you already have a nice language for representing IRI patterns, we could just use that).

So I would suggest this:

  • instead of --base as you suggest it, we have two parameters --base-iris which takes a list of valid iri patterns, and --obo-base-iris which adds all these default annotation properties you are suggesting, like IAO, the obo in owl, subsets etc.

Maybe something like this:

  • filter -i $IN --base-iris hp: http://purl.org/test --axioms internal -o $OUT --trim false: Creates a base module from $IN by removing all external axioms according to above spec.
  • filter -i $IN --base-iris hp: http://purl.org/test --axioms external -o $OUT --trim false: Creates a base module from $IN by removing all internal axioms according to above spec.

We should be a bit careful though with axiom annotations; these should, IMHO, never be removed (if the axiom is not removed).

(Maybe not further complicate remove command if the behaviour is exactly the same as filter? I dont mind.)

@beckyjackson
Copy link
Contributor Author
beckyjackson commented Sep 25, 2019

The base parameters takes a set of IRI patterns instead of "namespaces"

Since we are setting --base (or maybe --base-iri/--base-iris) as a global option, I'd like to be able to reuse it in the future. I'm not sure for what yet, but I think it makes more sense to just have this be the namespace, instead of asking for patterns. I like the idea of having an --exclude-defaults-bases <true/false> global option (default is false). If that option is false, those namespaces get added to the bases. Then again, if we want to reuse --base somewhere, we may not want the defaults anyway... 😕

then everyone can decide whether IAO:0000115 should be in it or not

I guess instead of hard coding it, people would just need to include it in their --exclude-term if they wanted to keep it (with this current method). Unless it makes sense to include IAO and RO as base namespaces for the OBO/OWL defaults? But then we might get into trouble with people importing certain terms from multiple files (which is currently causing conflicts in OBI).

you already have a nice language for representing IRI patterns, we could just use that

Another alternative is allowing both full IRIs and IRI patterns in the --exclude-term option.

I'm still worried about implementing this in the --axioms option because of how that option works. That option gets parsed into a Set<Class<? extends OWLAxiom>> which means we'd have to make exceptions for these internal/external cases. Not only would we need to change how the option is parsed, but we would also need to change how the option is eventually handled. To include axioms based on axiom type now, we just check if the axiom extends that particular OWLAxiom class:

public static boolean extendsAxiomTypes(
OWLAxiom axiom, Set<Class<? extends OWLAxiom>> axiomTypes) {
for (Class<? extends OWLAxiom> axiomType : axiomTypes) {
if (axiomType.isAssignableFrom(axiom.getClass())) {
return true;
}
}
return false;
}

@jamesaoverton
Copy link
Member

I agree with @beckyjackson that --axiom internal would be a major change to the current code.

@matentzn If filter --term-file $(BASESEED) --select "annotations ontology anonymous self" --trim true --signature true works, what if we just make it easier to build BASESEED?

  • All internal entities (all entities conforming to the namespaces you supply)

Yes. The simplest thing would be to enhance --term and --term-file to accept prefixes, then any term IRI that matches a specified prefix is included in the initial term set.

  • All object properties in O (So that we can get A sub R some B)

I don't understand this: wouldn't that make pretty much all class expressions count as internal?

  • And, for OBO file format compatibility ...

These could be added with current --term or --term-file.

I think this would be a simple extension that could work. I'm worried that we'll eventually need more powerful pattern matching and/or term exclusion patterns, but we'll keep forward-compatibility in mind.

@matentzn
Copy link
Contributor

@matentzn If filter --term-file $(BASESEED) --select "annotations ontology anonymous self" --trim true --signature true works, what if we just make it easier to build BASESEED?

Now that I think of it; this comment was made in haste; this is NOT base. This is simple. Lets forget I ever said it. This PR is not affected by this issue, and I will make a ticket at a later state to deal with BASESEED.

Yes. The simplest thing would be to enhance --term and --term-file to accept prefixes, then any term IRI that matches a specified prefix is included in the initial term set.

That would work! Great idea! This would be all I need; be able to supply sets of iri-patterns to termfile (or one to term). If we do this, we dont really need the --base parameter anymore?

I don't understand this (Object Properties): wouldn't that make pretty much all class expressions count as internal?

Yeah sorry, I mixed up two issues. Apologies, please ignore.

@beckyjackson
Copy link
8000 Contributor Author

After a lot of back and forth, I think we have finally come up with a solution.

We ended up implementing your original suggestion to have internal and external axiom selectors. We tried to make it work with the objects, but if you have any property in your select set, you might end up with external axioms that use those properties.

The final implementation combines --base-iri <namespace> and --axioms. The base IRI can either be the full namespace or a prefix (e.g., --base-iri OBI). If you do not specify one or more base namespaces, internal and external selectors are ignored.

If you want a true "base", make sure to use --preserve-structure false (-p) or you'll get subclass of relationships between external terms. This step works after all the other axiom selectors, so --trim has already been processed.

For example, an OBI base:

robot remove --input obi.owl \
  --base-iri OBI --base-iri CHMO \
  --axioms external -p false \
  --output obi-base.owl

And if you want to include/exclude certain terms, this implementation still has the --include-term and --exclude-term options.

@beckyjackson
Copy link
Contributor Author

From @matentzn

My first round of tests worked beautifully!

I think it would be good to write up the spec for this in a bit more detail; My main concerns are:

Given a base iri A:

  1. A:001 Sub B:001 [xref X:001] (all axiom annotations on internal axioms are included, and annotations on annotations as well)
  2. A:001 label "label" [xref X:001] (all axiom annotations on internal entities are included)
  3. B:001 label "label" [xref A:001] <- if there is an axiom annotation pointing to an internal IRI/Curie, it is NOT included (I guess?)

Could we maybe gather the current spec for the command in one comment somewhere on the PR? I will review it again then. But the behaviour looks good.

Only axioms that have a subject in base "A" are included. Therefore, in case 3, the xref using the internal term A would not be included because it doesn't point to an internal subject. Are axiom annotations behaving as you would expect them to in these cases you outlined above?

@beckyjackson
Copy link
Contributor Author

@matentzn - just wanted to follow up on my last comment on this post. I'd like to make sure this is working well for your use cases 🙂

@matentzn
Copy link
Contributor

Hey Becky, where is the latest jar that reflects this branch? I can test today.

@jamesaoverton
Copy link
Member

Now that I added a Jenkinsfile to the branch, Jenkins is built the latest JAR here: https://build.obolibrary.io/job/ontodev/job/robot/job/foreign-ax/

@matentzn
Copy link
Contributor

Thanks James. This is really helpful!

@matentzn
Copy link
Contributor

Thats it! This command will help making the OBO world a better place. And my live in particular. I tested it on the three cases I care about by running first something like this (make):

mp.owl:
	./robot merge -I $(OBOPURL)/mp.owl -o $@	

test_foreign_axiom: mp.owl
	./robot --version
	./robot remove -i $< --base-iri MP --base-iri http://purl.obolibrary.org/obo/mp/ --axioms external -p false -o $@_int.owl
	./robot remove -i $< --base-iri MP --base-iri http://purl.obolibrary.org/obo/mp/ --axioms internal -p false -o $@_ext.owl
	./robot merge -i $@_ext.owl -i $@_int.owl -o $@_merged.owl
	./robot diff --left $@_merged.owl --right $< -o $@_diff.txt

And then inspecting the two slices manually (and the diff, which is empty). I can confirm yes; I am happy with that. I have not looked at super fringe cases yet (axiom annotations on axiom annotations that annotate an internal axiom), but to be honest, l am happy to just let those occur and report them as bugs when they come. Thanks so much!

@jamesaoverton jamesaoverton changed the title [WIP] Remove foreign axioms to create base product Remove foreign axioms to create base product Nov 20, 2019
@jamesaoverton jamesaoverton merged commit ab0d9e8 into master Nov 20, 2019
@jamesaoverton jamesaoverton deleted the foreign-ax branch March 2, 2020 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0