8000 Stringent comparison of CDS using --strict-match · Issue #92 · gpertea/gffcompare · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Stringent comparison of CDS using --strict-match #92
Open
@etvedte

Description

@etvedte

Greetings,

I am interested in computing accuracy metrics for a query GFF against a reference. The reference/query files have both CDS and exon features. I want to perform accuracy calculations using strict terminal boundaries, operating on CDS specifically.

I did some testing and made the following observations:

  1. Exon features are prioritized for accuracy metrics, but CDS can still be used. That is, removing exon rows changes the accuracy values when calculated from CDS+exon, but removing CDS rows does not.
  2. The -e parameter reads "max. distance (range) allowed from free ends of terminal exons of reference transcripts." Setting -e 0 in CDS file only changes exon-level accuracy metrics. Transcript/locus level are unchanged. Sensitivity/Precision unsurprisingly dips slightly with -e 0
  3. In the documentation under transcript description, but not in the parameter list: "Using --strict-match option can make the accuracy estimation at this level much more stringent by only allowing a limited variation of the outer coordinates of the terminal exons (by at most 100 bases by default, but this value can be changed with the -e option)." When I set --strict-match -e 0, the exon/intron level remains the same relative to -e 0, but intron-chain/transcript/locus level all decrease.

Given the observations above I think --strict-match -e 0 is the correct way to stringently compare CDS. Do you agree, or maybe have a different suggestion? The parameter --strict-match isn't clearly described in the documentation. By "only allowing a limited variation of the outer coordinates of the terminal exons" , does this mean when running default gffcompare (--strict-match is not specified) then terminal exon boundaries can be extremely different so long as they have matching intron chains?

As an aside, I am not sure why one would want to calculate accuracy using -e 0 alone, which allows some fuzziness in the parent-level features but is strict at exon/intron level. Have you observed any specific use cases for this?

Thanks,

Eric

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0