8000 CDS phase (offset for eg ribo slippage) · Issue #76 · SACGF/cdot · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
CDS phase (offset for eg ribo slippage) #76
Open
@holtgrewe

Description

@holtgrewe

I'm puzzled and need help understanding. Consider NM_015068.3. I think that the exons given in the CDOT JSON skip one base that needs to be in the CDS as it's read double (slippage). What do you think?

Here is how it looks like in cdot-0.2.24.refseq.grch37.json.

# jq '.transcripts | to_entries[] | (select(.key == "NM_015068.3"))' cdot-0.2.24.refseq.grch37.json
{
  "key": "NM_015068.3",
  "value": {
    "biotype": [
      "mRNA"
    ],
    "gene_name": "PEG10",
    "gene_version": "23089",
    "genome_builds": {
      "GRCh37": {
        "cds_end": 94294994,
        "cds_start": 94292868,
        "contig": "NC_000007.13",
        "exons": [
          [
            94285636,
            94285892,
            0,
            1,
            256,
            null
          ],
          [
            94292645,
            94299007,
            1,
            257,
            6618,
            null
          ]
        ],
        "note": "protein translation is dependent on -1 ribosomal frameshift%3B isoform 1 is encoded by transcript variant 1",
        "strand": "+",
        "url": "https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"
      }
    },
    "hgnc": "14005",
    "id": "NM_015068.3",
    "protein": "NP_055883.2",
    "start_codon": 479,
    "stop_codon": 2605
  }
}

NCBI Gene says

[...]
     CDS             join(480..1436,1436..2605)
[...]

The original GFF3 says

 egrep -w 'ID=gene14272|Parent=rna16954|NM_015068.3' ref_GRCh37.p10_top_level.gff3
NC_000007.13    RefSeq  gene    94285637        94299007        .       +       .       ID=gene14272;Name=PEG10;Dbxref=GeneID:23089,HGNC:14005,MIM:609810;description=paternally expressed 10;gbkey=Gene;gene=PEG10;gene_synonym=EDR,HB-1,Mar2,Mart2,MEF3L,RGAG3
NC_000007.13    RefSeq  mRNA    94285637        94299007        .       +       .       ID=rna16953;Name=NM_015068.3;Parent=gene14272;Dbxref=GeneID:23089,Genbank:NM_015068.3,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 1;transcript_id=NM_015068.3
NC_000007.13    RefSeq  exon    94285682        94285903        .       +       .       ID=id161180;Parent=rna16954;Dbxref=GeneID:23089,Genbank:NM_001172437.1,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 2;transcript_id=NM_001172437.1
NC_000007.13    RefSeq  exon    94292646        94299007        .       +       .       ID=id161181;Parent=rna16954;Dbxref=GeneID:23089,Genbank:NM_001172437.1,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 2;transcript_id=NM_001172437.1
NC_000007.13    RefSeq  CDS     94285899        94285903        .       +       0       ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1
NC_000007.13    RefSeq  CDS     94292646        94293825        .       +       1       ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1
NC_000007.13    RefSeq  CDS     94293825        94294994        .       +       0       ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0