8000 Stringent comparison of CDS using --strict-match · Issue #92 · gpertea/gffcompare · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Stringent comparison of CDS using --strict-match #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

8022

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
etvedte opened this issue Jul 29, 2024 · 0 comments
Open

Stringent comparison of CDS using --strict-match #92

etvedte opened this issue Jul 29, 2024 · 0 comments

Comments

@etvedte
Copy link
etvedte commented Jul 29, 2024

Greetings,

I am interested in computing accuracy metrics for a query GFF against a reference. The reference/query files have both CDS and exon features. I want to perform accuracy calculations using strict terminal boundaries, operating on CDS specifically.

I did some testing and made the following observations:

  1. Exon features are prioritized for accuracy metrics, but CDS can still be used. That is, removing exon rows changes the accuracy values when calculated from CDS+exon, but removing CDS rows does not.
  2. The -e parameter reads "max. distance (range) allowed from free ends of terminal exons of reference transcripts." Setting -e 0 in CDS file only changes exon-level accuracy metrics. Transcript/locus level are unchanged. Sensitivity/Precision unsurprisingly dips slightly with -e 0
  3. In the documentation under transcript description, but not in the parameter list: "Using --strict-match option can make the accuracy estimation at this level much more stringent by only allowing a limited variation of the outer coordinates of the terminal exons (by at most 100 bases by default, but this value can be changed with the -e option)." When I set --strict-match -e 0, the exon/intron level remains the same relative to -e 0, but intron-chain/transcript/locus level all decrease.

Given the observations above I think --strict-match -e 0 is the correct way to stringently compare CDS. Do you agree, or maybe have a different suggestion? The parameter --strict-match isn't clearly described in the documentation. By "only allowing a limited variation of the outer coordinates of the terminal exons" , does this mean when running default gffcompare (--strict-match is not specified) then terminal exon boundaries can be extremely different so long as they have matching intron chains?

As an aside, I am not sure why one would want to calculate accuracy using -e 0 alone, which allows some fuzziness in the parent-level features but is strict at exon/intron level. Have you observed any specific use cases for this?

Thanks,

Eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0