This track shows the alignment of three different mRNA vaccine sequences to the SARS-CoV-2 genome:
Note that the actual vaccines are synthesized with N1-methyl-pseudouridine (Ψ) in place of uridine. See paper by Hubert in References for a discussion.
The psl output from blat was converted to a bigPsl format file for display in this track. Depending upon the size of the section of the genome in display, the track will draw black where nucleotides are identical between vaccine sequence and the SARS-CoV-2 sequence. Red lines indicate differences in nucleotides. At viewpoints with smaller sections of the genome in view, setting the Color track by codons or bases: to different mRNA bases will show the nucleotides in the vaccine that are different than the SARS-CoV-2 sequence.
The mRNA sequences were obtained from the MS WORD documents as mentioned in the references below. And the Andrew Fire lab github supplied the fasta sequencing result for the BioNTech/Pfizer BNT-162b2 and Moderna mRNA-1273 samples.
The PSL alignment file was obtained via the UCSC genome browser blat service with parameters -t=dnax -q=rnax and filtered to allow only scores above 1000 to filter out the polyA match:
gfClient -maxIntron=10 -t=dnax -q=rnax <host> <port> \ /gbdb/wuhCor1 threeVaccines.fa stdout \ | pslFilter -minScore=1000 stdin wuhCor1.vaccines.psl pslScore wuhCor1.vaccines.psl #tName tStart tEnd qName:qStart-qEnd score percentIdent NC_045512v2 21559 25384 ModernaMrna1273:54-3879 1419 68.60 NC_045512v2 21559 25384 ReconstructedBNT162b2:51-3876 1701 72.30 NC_045512v2 21559 25384 WHO_BNT162b2:51-3876 1701 72.30 faCount threeVaccines.fa | tawk '{print $1,"1.."$2+1}' \ | head -4 | tail -3 > threeVaccines.cds pslToBigPsl -cds=threeVaccines.cds -fa=threeVaccines.fa wuhCor1.vaccines.psl stdout \ | sort -k1,1 -k2,2n > wuhCor1.vaccines.bigPsl bedToBigBed -type=bed12+13 -tab -as=HOME/kent/src/hg/lib/bigPsl.as \ wuhCor1.vaccines.bigPsl wuhCor1.chrom.sizes wuhCor1.vaccines.bb
The fasta file sequences and psl alignment file can be obtained from our download server at: https://hgdownload.soe.ucsc.edu/goldenPath/$db/vaccines/.
The bigPsl alignment file used for the display of this track in the genome browser can be accessed from https://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/$db.vaccines.bb. The kent command line access tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here.
The protein encoded by the three sequences has two AA substitutions compared to the SARS-CoV-2 S glycoprotein. Variations: S:K986P and S:V987P in the vaccine sequence. See also: The tiny tweak behind COVID-19 vaccines.
>BNT162b2
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFD
NPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVY
SSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT
LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRV
QPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC
NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL
PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGS
NVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
SVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGF
NFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAG
TITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN
TLVKQLSSNFGAISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRV
DFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT
FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL
QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYTZZ
Dae Eun Jeong, Matthew McCoy, Karen Artiles, Orkan Ilbay, Andrew Fire, Kari Nadeau, Helen Park, Brooke Betts, Scott Boyd, Ramona Hoh, and Massa Shoura Assemblies of putative SARS-CoV2-spike-encoding mRNA sequences for vaccines BNT-162b2 and mRNA-1273 obtained from github
Bert Hubert Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine 25 Dec 2020
WikiPedia Pfizer-BioNTech COVID-19 vaccine
World Health Organization MedNet Messenger RNA encoding the full-length SARS-CoV-2 spike glycoprotein Sept. 2020 document 11889
Cyril Le Nouën, Peter L. Collins, and Ursula J. Buchholz Attenuation of Human Respiratory Viruses by Synonymous Genome Recoding Frontiers in Immunology 2019; 10: 1250. PMID: 31231383
Ryan Cross The tiny tweak behind COVID-19 vaccines, Chemical & Engineering News 29 September 2020 Vol 98, issue 38
Thank you to the Andrew Fire lab, Stanford University School of Medicine for providing the sequencing data of these vaccines.
The presentation of this track was prepared by Hiram Clawson (hclawson@ucsc.edu).