Open
Description
I'm puzzled and need help understanding. Consider NM_015068.3. I think that the exons given in the CDOT JSON skip one base that needs to be in the CDS as it's read double (slippage). What do you think?
Here is how it looks like in cdot-0.2.24.refseq.grch37.json
.
# jq '.transcripts | to_entries[] | (select(.key == "NM_015068.3"))' cdot-0.2.24.refseq.grch37.json
{
"key": "NM_015068.3",
"value": {
"biotype": [
"mRNA"
],
"gene_name": "PEG10",
"gene_version": "23089",
"genome_builds": {
"GRCh37": {
"cds_end": 94294994,
"cds_start": 94292868,
"contig": "NC_000007.13",
"exons": [
[
94285636,
94285892,
0,
1,
256,
null
],
[
94292645,
94299007,
1,
257,
6618,
null
]
],
"note": "protein translation is dependent on -1 ribosomal frameshift%3B isoform 1 is encoded by transcript variant 1",
"strand": "+",
"url": "https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"
}
},
"hgnc": "14005",
"id": "NM_015068.3",
"protein": "NP_055883.2",
"start_codon": 479,
"stop_codon": 2605
}
}
NCBI Gene says
[...]
CDS join(480..1436,1436..2605)
[...]
The original GFF3 says
egrep -w 'ID=gene14272|Parent=rna16954|NM_015068.3' ref_GRCh37.p10_top_level.gff3
NC_000007.13 RefSeq gene 94285637 94299007 . + . ID=gene14272;Name=PEG10;Dbxref=GeneID:23089,HGNC:14005,MIM:609810;description=paternally expressed 10;gbkey=Gene;gene=PEG10;gene_synonym=EDR,HB-1,Mar2,Mart2,MEF3L,RGAG3
NC_000007.13 RefSeq mRNA 94285637 94299007 . + . ID=rna16953;Name=NM_015068.3;Parent=gene14272;Dbxref=GeneID:23089,Genbank:NM_015068.3,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 1;transcript_id=NM_015068.3
NC_000007.13 RefSeq exon 94285682 94285903 . + . ID=id161180;Parent=rna16954;Dbxref=GeneID:23089,Genbank:NM_001172437.1,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 2;transcript_id=NM_001172437.1
NC_000007.13 RefSeq exon 94292646 94299007 . + . ID=id161181;Parent=rna16954;Dbxref=GeneID:23089,Genbank:NM_001172437.1,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 2;transcript_id=NM_001172437.1
NC_000007.13 RefSeq CDS 94285899 94285903 . + 0 ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1
NC_000007.13 RefSeq CDS 94292646 94293825 . + 1 ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1
NC_000007.13 RefSeq CDS 94293825 94294994 . + 0 ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1
Metadata
Metadata
Assignees
Labels
No labels