A comprehensive SARS-CoV-2 genomic analysis among Indians to identify conserved regions and mutational hotspots for improved diagnostics
Supplementary Data The Data Collected from Various tools such as MSA from MAFFT, JalView, NCBI SARS-CoV-2 Data hub along with (1,026) sequences, Phylogenetic Tree files, NextClade and Auspice
Using the consensus of various sequence matching approaches, this work illustrates the genetic heterogeneity of Indian SARS-CoV-2 genomes. The phylogenetic analysis of the whole genome sequences using Nextstrain also portrayed the prevalence of prominent clades of SARS-CoV-2 prevailing in Indian Region twelve clades: 19A, 19B, 20A, 20B, 20C, 20G, 20H, 20I, 21A, 21I, 21J and 21B. Furthermore, the study reveals that the Indian SARS-CoV-2 genomes had an alignment of 1026 sequences, including reference genome sequences, with 27 unique alterations, deletions, and SNPs. These are non-synonymous mutations. The causes of these prevalent mutations should be investigated more in the future. Finding SNPs, on the other hand, is aimed at identifying genomic locations that are utilized to classify virus strains in India. Regardless of these factors, once the virus strain has been identified, the appropriate vaccine can be employed. These SNPs could be used in the future to model proteins so that medications can be produced to target them, which could be useful in diagnostic studies.
Minor Project Data
Submitted in fulfilment of the award of Degree of Master of Technology (Biotechnology)
SCHOOL OF BIOTECHNOLOGY GAUTAM BUDDHA UNIVERSITY GREATER NOIDA
January 2022