You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in computing accuracy metrics for a query GFF against a reference. The reference/query files have both CDS and exon features. I want to perform accuracy calculations using strict terminal boundaries, operating on CDS specifically.
I did some testing and made the following observations:
Exon features are prioritized for accuracy metrics, but CDS can still be used. That is, removing exon rows changes the accuracy values when calculated from CDS+exon, but removing CDS rows does not.
The -e parameter reads "max. distance (range) allowed from free ends of terminal exons of reference transcripts." Setting -e 0 in CDS file only changes exon-level accuracy metrics. Transcript/locus level are unchanged. Sensitivity/Precision unsurprisingly dips slightly with -e 0
In the documentation under transcript description, but not in the parameter list: "Using --strict-match option can make the accuracy estimation at this level much more stringent by only allowing a limited variation of the outer coordinates of the terminal exons (by at most 100 bases by default, but this value can be changed with the -e option)." When I set --strict-match -e 0, the exon/intron level remains the same relative to -e 0, but intron-chain/transcript/locus level all decrease.
Given the observations above I think --strict-match -e 0 is the correct way to stringently compare CDS. Do you agree, or maybe have a different suggestion? The parameter --strict-match isn't clearly described in the documentation. By "only allowing a limited variation of the outer coordinates of the terminal exons" , does this mean when running default gffcompare (--strict-match is not specified) then terminal exon boundaries can be extremely different so long as they have matching intron chains?
As an aside, I am not sure why one would want to calculate accuracy using -e 0 alone, which allows some fuzziness in the parent-level features but is strict at exon/intron level. Have you observed any specific use cases for this?
Thanks,
Eric
The text was updated successfully, but these errors were encountered:
Greetings,
I am interested in computing accuracy metrics for a query GFF against a reference. The reference/query files have both CDS and exon features. I want to perform accuracy calculations using strict terminal boundaries, operating on CDS specifically.
I did some testing and made the following observations:
Given the observations above I think --strict-match -e 0 is the correct way to stringently compare CDS. Do you agree, or maybe have a different suggestion? The parameter --strict-match isn't clearly described in the documentation. By "only allowing a limited variation of the outer coordinates of the terminal exons" , does this mean when running default gffcompare (--strict-match is not specified) then terminal exon boundaries can be extremely different so long as they have matching intron chains?
As an aside, I am not sure why one would want to calculate accuracy using -e 0 alone, which allows some fuzziness in the parent-level features but is strict at exon/intron level. Have you observed any specific use cases for this?
Thanks,
Eric
The text was updated successfully, but these errors were encountered: