Which form of dengue evolved first?

We know that the dengue virus comes from mosquitoes and has four different forms but which of its four forms arose first? This question is answered below, intentionally in a long form to illustrate basic concepts in Bioinformatics and how python can be effectively used for different types of analysis in Bioinformatics.

The impact of dengue

Dengue fever is a mosquito borne disease caused by the dengue virus (DENV). It occurs in about 100 countries in the world and about 3 Billion people (almost half the world's population!) is at risk. A 2013 paper in Nature estimates that DENV infects about 390 Million people in the world every year, of which about 96 Million manifest apparently any level of clinical or sub-clinical severity. The mortality estimate is 1-5% without treatment and less than 1% with adequate treatment. However, the more severe form of dengue fever called the dengue hemorrhagic fever which occurs in less than 5% of all dengue cases , carries a mortality rate of about 26%. The virus that causes dengue disease is divided into four closely related serotypes (dengue virus 1, 2, 3 and 4). Researchers have shown that the more dangeous hemorrhagic fever occurs when a person, after being infected with one serotype, gets infected with another one. The wikipedia picture below shows a TEM micrograph with dengue virus virions (the cluster of dark dots near the center, each about 50nm in size). Below that the structure of the outer shell of a single DENV virus is shown (from PDB). alt text alt text

Flaviviruses

After the scary and alarming description above, now we come to the main question: Where did the Dengue virus come from? To answer this question, we will go through some basics. Dengue viruses belong to a virus family called Flaviviridae which includes, among other viruses, the West Nile and Hepatitis viruses. To further the impact this virus family has on humans, note that about 2-3% of the world population (this number is equal to Pakistan's current population!) is infected with Hepatitis C. No vaccine exists either for Dengue or Hepatitis C and this presents a huge risk.

Biologically, these viruses are RNA viruses, i.e., they use RNA as their genetic material instead of DNA (as in humans). You can thus think of the genetic material of these viruses as a 4 character alphabet (the 4 nucleotide bases A,C,G and U) string with length of around 11,000. Theirs is a really small genome in comparison to, say, humans (3.2 Billion DNA base pairs) and the canopy plant (150 Billion DNA base pairs). The genomic sequence of an organism can be found through genomic sequencing technologies. Off the topic: It is interesting to note that each of the 100 trillion or so cells that humans have contains a dust particle sized cell nucleus (typical diameter of 2-10 microns) a small part of which (chromosomes) stores 3.2 Billion base pairs of pragmatically and simultaneously accessible genetic information. Now, that is some serious information density!

RNA and proteins

Biological cells and viruses function by using the RNA as a pseudo-template to generate proteins through the conceptually straight-forward process of translation. Proteins are large molecules made up of long linear chain(s) of amino acid residues which folds into a 3D structure. With a tilt of the hat to DNA and RNA, proteins are the most important biological molecule as all vital cellular or viral processes ultimately involve proteins. For example, the translation of RNA into proteins is itself made possible through proteins. The outer shell of the virus is also made up a mosaic of a complex of three proteins. In flaviviridae, the replication of the virus involves two proteins called NS3 and NS5. To appreciate the importance of proteins in general, note that 50% of the dry weight of the human body is protein. As a computer scientist, you can think of a protein as an arbitrary length string of 20 characters (the 20 amino acids). For example, the NS3 protein in DENV is 619 amino acids long.

Since all proteins are coded through RNA, so theretically, the protein sequence contains the same information as the coresponding RNA sequence. As a matter of fact, the sequence of all proteins in a virus (called the polyprotein) is often used instead of the genome.

The proteins of dengue

Below, we first display, using Python's Biopython package, the polyprotein sequence of DENV downloaded from uniprot as a FASTA file.

In [1]:
from Bio import SeqIO
import urllib2 as url

def downloadFile(url,ofname=None):
    import urllib2
    if ofname is None:
        ofname = url.split('/')[-1]
    u = urllib2.urlopen(url)
    with  open(ofname, 'wb') as f:
        meta = u.info()
        file_size = int(meta.getheaders("Content-Length")[0])
        print "Downloading: %s Bytes: %s" % (ofname, file_size)
        
        file_size_dl = 0
        block_sz = 8192
        while True:
            buffer = u.read(block_sz)
            if not buffer:
                break        
            file_size_dl += len(buffer)
            f.write(buffer)
            status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
            status = status + chr(8)*(len(status)+1)
            print status
    return ofname

def readFASTA(fname):
    """
    Reads the fasta file fname and returns the sequence object
    """
    with open(fname, "rU") as handle:
    	record = list(SeqIO.parse(handle, "fasta"))
    return record

url = 'http://www.uniprot.org/uniprot/P29990.fasta'
records = readFASTA(downloadFile(url))
print "\nThere are",len(records),"records in this file\n"
# Read the first one
r = records[0]
print "The length of dengue polyprotein is: %i \n" %len(r)
print "The name of the record is:",r.name,"\n"
print "The description of the record is:",r.description,"\n"
print "The polyprotein sequence is shown below:\n"

print str(r.seq)
Downloading: P29990.fasta Bytes: 3551
      3551  [100.00%]

There are 1 records in this file

The length of dengue polyprotein is: 3391 

The name of the record is: sp|P29990|POLG_DEN26 

The description of the record is: sp|P29990|POLG_DEN26 Genome polyprotein OS=Dengue virus type 2 (strain Thailand/16681/1984) PE=1 SV=1 

The polyprotein sequence is shown below:

MNDQRKKAKNTPFNMLKRERNRVSTVQQLTKRFSLGMLQGRGPLKLYMALVAFLRFLTIPPTAGILKRWGTIKKSKAINVLRGFRKEIGRMLNILNRRRRSAGMIIMLIPTVMAFHLTTRNGEPHMIVSRQEKGKSLLFKTEDGVNMCTLMAMDLGELCEDTITYKCPLLRQNEPEDIDCWCNSTSTWVTYGTCTTMGEHRRQKRSVALVPHVGMGLETRTETWMSSEGAWKHVQRIETWILRHPGFTMMAAILAYTIGTTHFQRALIFILLTAVTPSMTMRCIGMSNRDFVEGVSGGSWVDIVLEHGSCVTTMAKNKPTLDFELIKTEAKQPATLRKYCIEAKLTNTTTESRCPTQGEPSLNEEQDKRFVCKHSMVDRGWGNGCGLFGKGGIVTCAMFRCKKNMEGKVVQPENLEYTIVITPHSGEEHAVGNDTGKHGKEIKITPQSSTTEAELTGYGTVTMECSPRTGLDFNEMVLLQMENKAWLVHRQWFLDLPLPWLPGADTQGSNWIQKETLVTFKNPHAKKQDVVVLGSQEGAMHTALTGATEIQMSSGNLLFTGHLKCRLRMDKLQLKGMSYSMCTGKFKVVKEIAETQHGTIVIRVQYEGDGSPCKIPFEIMDLEKRHVLGRLITVNPIVTEKDSPVNIEAEPPFGDSYIIIGVEPGQLKLNWFKKGSSIGQMFETTMRGAKRMAILGDTAWDFGSLGGVFTSIGKALHQVFGAIYGAAFSGVSWTMKILIGVIITWIGMNSRSTSLSVTLVLVGIVTLYLGVMVQADSGCVVSWKNKELKCGSGIFITDNVHTWTEQYKFQPESPSKLASAIQKAHEEGICGIRSVTRLENLMWKQITPELNHILSENEVKLTIMTGDIKGIMQAGKRSLRPQPTELKYSWKTWGKAKMLSTESHNQTFLIDGPETAECPNTNRAWNSLEVEDYGFGVFTTNIWLKLKEKQDVFCDSKLMSAAIKDNRAVHADMGYWIESALNDTWKIEKASFIEVKNCHWPKSHTLWSNGVLESEMIIPKNLAGPVSQHNYRPGYHTQITGPWHLGKLEMDFDFCDGTTVVVTEDCGNRGPSLRTTTASGKLITEWCCRSCTLPPLRYRGEDGCWYGMEIRPLKEKEENLVNSLVTAGHGQVDNFSLGVLGMALFLEEMLRTRVGTKHAILLVAVSFVTLIIGNMSFRDLGRVMVMVGATMTDDIGMGVTYLALLAAFKVRPTFAAGLLLRKLTSKALMMTTIGIVLSSQSTTPETILELTDALALGMMVLKMVRNMEKYQLAVTIMAILCVPNAVILQNAWKVSCTILAVVSVSPLFLTSSQQKTDWIPLALTIKGLNPTAIFLTTLSRTSKKRSWPLNEAIMAVGMVSILASSLLKNDIPMTGPLVAGGPLTVCYVLTGRSADLELERAADVKWEDQAEISGSSPILSITISEDGSMSIKNEEEEQTLTILIRTGLLVISGLFPVSIPITAAAWYLWEVKKQRAGVLWDVPSPPPMGKAELEDGAYRIKQKGILGYSQIGAGVYKEGTFHTMWHVTRGAVLMHKGKRIEPSWADVKKDLISYGGGWKLEGEWKEGEEVQVLALEPGKNPRAVQTKPGLFKTNAGTIGAVSLDFSPGTSGSPIIDKKGKVVGLYGNGVVTRSGAYVSAIAQTEKSIEDNPEIEDDIFRKRRLTIMDLHPGAGKTKRYLPAIVREAIKRGLRTLILAPTRVVAAEMEEALRGLPIRYQTPAIRAEHTGREIVDLMCHATFTMRLLSPVRVPNYNLIIMDEAHFTDPASIAARGYISTRVEMGEAAGIFMTATPPGSRDPFPQSNAPIIDEEREIPERSWNSGHEWVTDFKGKTVWFVPSIKAGNDIAACLSKNGKKVIQLSRKTFDSEYAKTRTNDWDFVVTTDISEMGANFKAERVIDPRRCMKPVILTDGEERVILAGPMPVTHSSAAQRRGRIGRNPKNENDQYIYMGEPLENDEDCAHWKEAKMLLDNINTPEGIIPSMFEPEREKVDAIDGEYRLRGEARTTFVDLMRRGDLPVWLAYRVAAEGINYADRRWCFDGVKNNQILEENVEVEIWTKEGERKKLKPRWLDARIYSDPLALKEFKEFAAGRKSLTLNLITEMGRLPTFMTQKARDALDNLAVLHTAEAGGRAYNHALSELPETLETLLLLTLLATVTGGILLFLMSGRGIGKMTLGMCCIITASILLWYAQIQPHWIAASIILEFFLIVLLIPEPEKQRTPQDNQLTYVVIAILTVVAATMANEMGFLEKTKKDLGLGSIATQQPESNILDIDLRPASAWTLYAVATTFVTPMLRHSIENSSVNVSLTAIANQATVLMGLGKGWPLSKMDIGVPLLAIGCYSQVNPTTLTAALFLLVAHYAIIGPALQAKASREAQKRAAAGIMKNPTVDGITVIDLDPIPYDPKFEKQLGQVMLLVLCVTQVLMMRTTWALCEVLTLATGPISTLWEGNPGRFWNTTIAVSMANIFRGSYLAGAGLLFSIMKNTTNARRGTGNIGETLGEKWKSRLNALGKSEFQIYKKSGIQEVDRTLAKEGIKRGETDHHAVSRGSAKLRWFVERNMVTPEGKVVDLGCGRGGWSYYCGGLKNVREVKGLTKGGPGHEEPIPMSTYGWNLVRLQSGVDVFFIPPEKCDTLLCDIGESSPNPTVEAGRTLRVLNLVENWLNNNTQFCIKVLNPYMPSVIEKMEALQRKYGGALVRNPLSRNSTHEMYWVSNASGNIVSSVNMISRMLINRFTMRYKKATYEPDVDLGSGTRNIGIESEIPNLDIIGKRIEKIKQEHETSWHYDQDHPYKTWAYHGSYETKQTGSASSMVNGVFRLLTKPWDVVPMVTQMAMTDTTPFGQQRVFKEKVDTRTQEPKEGTKKLMKITAEWLWKELGKKKTPRMCTREEFTRKVRSNAALGAIFTDENKWKSAREAVEDSRFWELVDKERNLHLEGKCETCVYNIMGKREKKLGEFGKAKGSRAIWYMWLGARFLEFEALGFLNEDHWFSRENSLSGVEGEGLHKLGYILRDVSKKEGGAMYADDTAGWDTRITLEDLKNEAMVTNHMEGEHKKLAEAIFKLTYQNKVVRVQRPTPRGTVMDIISRRDQRGSGQVGTYGLNTFTNMEAQLIRQMEGEGVFKSIQHLTITEEIAVQNWLARVGRERLSRMAISGDDCVVKPLDDRLPSALTALNDTGKIRKDIQQWEPSRGWNDWTQVPFCSHHFHELIMKDGRVLVVPCRNQDELIGRARISQGAGWSLRETACLGKSYDQMWSLMYFHRRDLRLAANAICSAVPSHWVPTSRTTWSIHAKHEWMTTEDMLTVWNRVWIQENPWMEDKTPVESWEEIPYLGKREDQWCGSLIGLTSRATWAKNIQAAINQVRSLIGNEEYTDYMPSMKRFRREEEEAGVLW

The NS3 proteins of different flaviviruses

We can also view the protein sequnces of NS3 proteins from different flaviviruses stored in the file NS3.fasta.

In [2]:
fname='NS3.fasta'
records = readFASTA(fname)
for r in records:
    print r.id, 'length is', len(r)
    print
    print str(r.seq)
    print
West_Nile_Culex length is 619

GGVLWDTPSPKEYKKGDTTTGVYRIMTRGLLGSYQAGAGVMVEGVFHTLWHTTKGAALMSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWNGHDEVQMIVVEPGKNVKNVQTKPGVFKTPEGEIGAVTLDYPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYISAIVQGERMEEPAPAGFEPEMLRKKQITVLDLHPGAGKTRKILPQIIKEAINKRLRTAVLAPTRVVAAEMSEALRGLPIRYQTSAVHREHSGNEIVDVMCHATLTHRLMSPHRVPNYNLFIMDEAHFTDPASIAARGYIATKVELGEAAAIFMTATPPGTSDPFPESNAPISDMQTEIPDRAWNTGYEWITEYVGKTVWFVPSVKMGNEIALCLQRAGKKVIQLNRKSYETEYPKCKNDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIIEEGDGRVILGEPSAITAASAAQRRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLVAQLYQPEREKVYTMDGEYRLRGEERKNFLEFLRTADLPVWLAYKVAAAGISYHDRKWCFDGPRTNTILEDNNEVEVITKLGERKILRPRWADARVYSDHQALKSFKDFASGKR

Japanese_encephalitis_Culex length is 619

GVFWDTPSPKPCSKGDTTTGVYRIMARGILGTYQAGVGVMYENVFHTLWHTTRGAAIMSGEGKLTPYWGSVKEDRIAYGGPWRFDRKWNGTDDVQVIVVEPGKAAVNIQTKPGVFRTPFGEVGAVSLDYPRGTSGSPILDSNGDIIGLYGNGVELGDGSYVSAIVQGDRQEEPVPEAYTPNMLRKRQMTVLDLHPGSGKTRKILPQIIKDAIQQRLRTAVLAPTRVVAAEMAEALRGLPVRYQTSAVQREHQGNEIVDVMCHATLTHRLMSPNRVPNYNLFVMDEAHFTDPASIAARGYIATKVELGEAAAIFMTATPPGTTDPFPDSNAPIHDLQDEIPDRAWSSGYEWITEYAGKTVWFVASVKMGNEIAMCLQRAGKKVIQLNRKSYDTEYPKCKNGDWDFVITTDISEMGANFGASRVIDCRKSVKPTILEEGEGRVILGNPSPITSASAAQRRGRVGRNPNQVGDEYHYGGATSEDDSNLAHWTEAKIMLDNIHMPNGLVAQLYGPEREKAFTMDGEYRLRGEEKKNFLELLRTADLPVWLAYKVASNGIQYTDRRWCFDGPRTNAILEDNTEVEIVTRMGERKILKPRWLDARVYADHQALKWFKDFAAGKRR

Bagaza_Aedes_Culex_Others length is 619

SGAIWDVPAPKEKKRAELSTGVFRIMARGILGKYQAGVGVMFDGVFHTMWHVTNGATINVGDGCLVPYWASVRDDLISYGGPWKMGTKWNGVDEVQVIVVQPGKSVMNVQTKPGVFKTSTGEIGAVTLDYPTGTSGSPIVDKEGNIIGLYGNGILVGAGSFVSMISQGQRMEEEIPQVFSEDMLRKRQLTVLDLHPGSGKTRKVLPQIVKTAIDRRLRTAILAPTRVVAAEIAEALKEYPIRYLTPAVKREHTGTEIVDVMCHATLTSRLLTPQRVPNYNLFVMDEAHFTDPASIAARGYISTKVELGEAAAIFMTATPPGTRDPFPDSNSPIMDVEEQIPDRAWNSGYEWITDYTGKTVWFVPSVKMGNEIAVCLTKAGKKVIQLNRKSFDSEYPKCKTGEWDFVITTDISEMGANFGASRVIDSRKCIKPVIIEDGEGSVQLNGPVPITAASAAQRRGRIGRSYVQVGDEYHFSGPTSEDDHDFAHWKEAKILLDNINLPNGLVAQLYEPEREKVFSIDGEYRLRTEQRKNFVEFLRTGDLPVWLSYKLAEAGVAYHDRKWCFDGPSINTVLEDNNPVELWTKSGEKKILRPRWRDGRLWADHQALKSFKEFACGKR

Apoi_Rodents length is 616

SNTSFSLITDILESDDHESPNEGIYRIFASGVLGKKQVGVGIWKQGVFHTMWHVTRGSVLSINGRRFQPEWASVRDDLISYNGPWRLPAKWEGGEVQLHGYTPDGKVQVTQLLPGRMVTDGGEELGLIPIDHPPGTSGSPIITKDGHVVGLYGNGVVHGETYCSSIAQANVPAETEPEVLKTNWTSKGKLTVIDAHPGSGKTKKILPRLVDEAVGRKLRTLVLAPTRVVVKEMEQALRGKPVSFHSSAASKKVPGALVDVMCHATYVQRRMMPIPQKNYELVIMDEAHWTDPSSIAARGFLDSQRKENHLSLVLMTATPPGTDDPWPQSNAPINDNEKIIPEGDWKTGHDWITEFEGKTVWFVPSLKAGATIAKTLRGFGKSTAVLSSKTFSEVYPTLRDSKPDFILTTDISEMGANFDVDRVIDSRTTLKPIEVGNCVDISGERAVTPASAAQRRGRVGREEGRQAEYVYQGEVDPDDGDLVCWKEAQILLDNMEGRFKTASVFYEPEITKMPDTPGYFKLPVEKKKYFRHLISNCDFTPWLAWKVASETKGVEDRGWVSLGPKEHEIADHEGELIKFKSPSGRENILRPVWLDNRMIREKRDLEALIDYAKGKR

Dengue4_Aedes length is 618

SGALWDVPSPAATQKAALSEGVYRIMQRGLFGKTQVGVGIHMEGVFHTMWHVTRGSVICHETGRLEPSWADVRNDMISYGGGWRLGDKWDKEEDVQVLAIEPGKNPKHVQTKPGLFKTLTGEIGAVTLDFKPGTSGSPIINRKGKVIGLYGNGVVTKSGDYVSAITQAERIGEPDYEVDEDIFRKKRLTIMDLHPGAGKTKRILPSIVREALKRRLRTLILAPTRVVAAEMEEALRGLPIRYQTPAVKSEHTGREIVDLMCHATFTTRLLSSTRVPNYNLIVMDEAHFTDPSSVAARGYISTRVEMGEAAAIFMTATPPGATDPFPQSNSPIEDIEREIPERSWNTGFDWITDYQGKTVWFVPSIKAGNDIANCLRKSGKKVIQLSRKTFDTEYPKTKLTDWDFVVTTDISEMGANFRAGRVIDPRRCLKPVILTDGPERVILAGPIPVTPASAAQRRGRIGRNPAQEDDQYVFSGDPLKNDEDHAHWTEAKMLLDNIYTPEGIIPTLFGPEREKTQAIDGEFRLRGEQRKTFVELMRRGDLPVWLSYKVASAGISYKDREWCFTGERNNQILEENMEVEIWTREGEKKKLRPKWLDARVYADPMALKDFKEFASGRK

Ilheus_Aedes_Culex_Others length is 617

GVMWDVPAPKQFGKTELKPGVYRVMTMGILGRYQSGVGVMWDGVFHTMWHVTQGAALRNGEGRLNPTWGSVRDDLISYGGKWKLSATWNGSEEVQMIAVEPGKAAKNYQTKPGVFKTPAGEIGAITLDFPKGTSGSPIINKAGEITGLYGNGIVLERGAYVSAITQGERQEEETPEAFTPDMLKKRRLTILDLHPGAGKTRRVIPQIVRECVKARLRTVILVPTRVVAAEMAEALRGLPIRYQTSAVKAEHSGNEIVDAMRHATLTQRLLTPAKVPNYNVFVMDEAHFTDPASIAARGYISTKVELGEAAAIFMTATPPGTTDPFPDSNAPIIDQEAEIPDRAWNSGFEWITEYTGKTVWFVPSVRMGNEIAMCLTKAGKKVIQLNRKSYDSEYQKCKGNDWDFVITTDISEMGANFGAHRVIDSRKCVKPVILDGDDRVLMNGPAPITPASAAQRRGRIGRDPTQSGDEYFYGGPTTTDDTGHAHWIEAKILLDNIQLQNGLVAQLYGPERDKVFTTDGEYRLRSEQKKNFVEFLRTGDLPVWLSYKVAEAGYAYTDRRWCFDGPANNTILEVRGDPEVWTRQGEKRILRPRWSDARVYCDNQALRSFKEFAAGKR

Dengue3_Aedes length is 619

SGVLWDVPSPPETQKAELEEGVYRIKQQGILGKTQVGVGVQKEGVFHTMWHVTRGAVLTHNGKRLEPNWASVKKDLISYGGGWRLSAQWQKGEEVQVIAVEPGKNPKNFQTMPGTFQTTTGEIGAIALDFKPGTSGSPIINREGKVVGLYGNGVVTKNGGYVSGIAQTNAEPDGPTPELEEEMFKKRNLTIMDLHPGSGKTRKYLPAIVREAIKRRLRTLILAPTRVVAAEMEEALKGLPIRYQTTATKSEHTGREIVDLMCHATFTMRLLSPVRVPNYNLIIMDEAHFTDPASIAARGYISTRVGMGEAAAIFMTATPPGTADAFPQSNAPIQDEERDIPERSWNSGNEWITDFAGKTVWFVPSIKAGNDIANCLRKNGKKVIQLSRKTFDTEYQKTKLNDWDFVVTTDISEMGANFKADRVIDPRRCLKPVILTDGPERVILAGPMPVTAASAAQRRGRVGRNPQKENDQYIFTGQPLNNDEDHAHWTEAKMLLDNINTPEGIIPALFEPEREKSAAIDGEYRLKGESRKTFVELMRRGDLPVWLAHKVASEGIKYTDRKWCFDGQRNNQILEENMDVEIWTKEGEKKKLRPRWLDARTYSDPLALKEFKDFAAGRK

Hepatitis_C length is 631

APITAYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTAAQTFLATCINGVCWTVYHGAGTRTIASSKGPVIQMYTNVDQDLVGWPAPQGARSLTPCTCGSSDLYLVTRHADVIPVRRRGDGRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAAVCTRGVAKAVDFIPVEGLETTMRSPVFSDNSSPPAVPQSYQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGSPITYSTYGKFLADGGCSGGAYDIIICDECHSTDATSILGIGTVLDQAETAGARLTVLATATPPGSVTVPHPNIEEVALSTTGEIPFYGKAIPLEAIKGGRHLIFCHSKKKCDELAAKLVALGVNAVAYYRGLDVSVIPASGDVVVVATDALMTGFTGDFDSVIDCNTCVTQTVDFSLDPTFTIETTTLPQDAVSRTQRRGRTGRGKPGIYRFVTPGERPSGMFDSSVLCECYDAGCAWYELTPAETTVRLRAYMNTPGLPVCQDHLEFWEGVFTGLTHIDAHFLSQTKQSGENLPYLVAYQATVCARAQAPPPSWDQMWKCLIRLKPTLHGPTPLLYRLGAVQNEITLTHPITKYIMTCMSADLEVVT

Dengue1_Aedes length is 619

SGVLWDTPSPPEVEKAVLDDGIYRILQRGLLGRSQVGVGVFQEGVFHTMWHVTRGAVLMYQGKRLEPSWASVKKDLISYGGGWRFQGSWNTGEEVQVIAVEPGKNPKNVQTAPGTFKTSEGEVGAIALDFKPGTSGSPIVNREGKIVGLYGNGVVTTSGTYVSAIAQAKASQEGPLPEIEDEVFRKRNLTIMDLHPGSGKTRRYLPAIVREAIKRKLRTLVLAPTRVVASEMAEALKGMPIRYQTTAVKSEHTGKEIVDLMCHATFTMRLLSPVRVPNYNMIIMDEAHFTDPASIAARGYISTRVGMGEAAAIFMTATPPGSVEAFPQSNAVIQDEEKDIPERSWNSGYDWITDFPGKTVWFVPSIKSGNDIANCLRKNGKRVIQLSRKTFDTEYQKTKNNDWDYVVTTDISEMGANFRADRVIDPRRCLKPVILKDGPERVILAGPMPVTVASAAQRRGRIGRNQNKEGDQYIYMGQPLNNDEDHAHWTEAKMLLDNINTPEGIIPALFEPEREKSAAIDGEYRLRGEARKTFVELMRRGDLPVWLSYKVASEGFQYSDRRWCFDGERNNQVLEENMDVEIWTKEGERKKLRPRWLDARTYSDPLALREFKEFAAGRR

Yellow_Fever_Aedes length is 622

GDVLWDIPTPKIIEECEHLEDGIYGIFQSTFLGASQRGVGVAQGGVFHTMWHVTRGAFLVRNGKKLIPSWASVKEDLVAYGGSWKLEGRWDGEEEVQLIAAVPGKNVVNVQTKPSLFKVRNGGEIGAVALDYPSGTSGSPIVNRNGEVIGLYGNGILVGDNSFVSAISQTEVKEEGKEELQEIPTMLKKGMTTVLDFHPGAGKTRRFLPQILAECARRRLRTLVLAPTRVVLSEMKEAFHGLDVKFHTQAFSAHGSGREVIDAMCHATLTYRMLEPTRVVNWEVIIMDEAHFLDPASIAARGWAAHRARANESATILMTATPPGTSDEFPHSNGEIEDVQTDIPSEPWNTGHDWILADKRPTAWFLPSIRAANVMAASLRKAGKSVVVLNRKTFEREYPTIKQKKPDFILATDIAEMGANLCVERVLDCRTAFKPVLVDEGRKVAIKGPLRISASSAAQRRGRIGRNPNRDGDSYYYSEPTSENNAHHVCWLEASMLLDNMEVRGGMVAPLYGVEGTKTPVSPGEMRLRDDQRKVFRELVRNCDLPVWLSWQVAKAGLKTNDRKWCFEGPEEHEILNDSGETVKCRAPGGAKKPLRPRWCDERVSSDQSALSEFIKFAEGRR

Omsk_hemorrhagic_fever_Tick length is 621

SDLVFSGQSGSERGSQPFEVRDGVYRILSPGLLWGHRQVGVGFGSKGVLHTMWHVTRGAAIFIDNAVAGPYWADVKEDVVCYGGAWSLEEKWKGEKVQVHAFPPGRAHEVHQCQPGELVLDTGRRIGAIPIDLAKGTSGSPILNAQGAVVGLYGNGLRTNETYVSSIAQGEVEKSRPNLPQAVVGTGWTSKGTITVLDMHPGSGKTHRVLPELIRQCIDKRLRTLVLAPTRVVLKEMERALSGKRVRFHSPAVGDQQTGNAIVDVMCHATYVNRRLLPQGRQNWEVAIMDEAHWTDPHSIAARGHLYSMAKENKCALVLMTATPPGKSEPFPESNGAITSEERQIPEGEWRDGFDWITEYEGRTAWFVPSIAKGGVIARTLRQKGKSVICLNSKTFEKDYSRVRDEKPDFVVTTDISEMGANLDVSRVIDGRTNIKPEEVDGKVELTGTRRVTTASAAQRRGRVGRHDGRTDEYIYSGQCDDDDSGLVQWKEAQILLDNITTLRGPVATFYGPEQDKMPEVAGHFRLTEERRKHFRHLLTHCDFTPWLAWHVAANVSNVTSRSWTWEGPEENAVDEANGDLVTFKSPNGAERTLRPVWRDARMFKEGRDIREFVAYASGRR

Zika_Mosquitoes_NonVector(HH) length is 617

SGALWDVPAPKEVKKGETTDGVYRVMTRRLLGSTQVGVGVMQEGVFHTMWHVTKGAALRSGEGRLDPYWGDVKQDLVSYCGPWKLDAAWDGLSEVQLLAVPPGERARNIQTLPGIFKTKDGDIGAVALDYPAGTSGSPILDKCGRVIGLYGNGVVIKNGSYVSAITQGKREEETPVECFEPSMLKKKQLTVLDLHPGAGKTRRVLPEIVREAIKKRLRTVILAPTRVVAAEMEEALRGLPVRYMTTAVNVTHSGTEIVDLMCHATFTSRLLQPIRVPNYNLNIMDEAHFTDPSSIAARGYISTRVEMGEAAAIFMTATPPGTRDAFPDSNSPIMDTEVEVPERAWSSGFDWVTDHSGKTVWFVPSVRNGNEIAACLTKAGKRVIQLSRKTFETEFQKTKNQEWDFVITTDISEMGANFKADRVIDSRRCLKPVILDGERVILAGPMPVTHASAAQRRGRIGRNPNKPGDEYMYGGGCAETDEGHAHWLEARMLLDNIYLQDGLIASLYRPEADKVAAIEGEFKLRTEQRKTFVELMKRGDLPVWLAYQVASAGITYTDRRWCFDGTTNNTIMEDSVPAEVWTKYGEKRVLKPRWMDARVCSDHAALKSFKEFAAGKR

Sepik_Aedes length is 623

SGVLWDIPTVVPPEEVGYLEDGVYTINQKNVLGMAQKGVGVVKDGVFHTMWHVTRGAFLLFEGKRLTPAWANVKEDLISYGGGWKLEAKWDGTEEVQLIAVAPGKNPMNIQTTPSIFQLTNGKEIGAVNLDFPSGTSGSPIVNKNGEVIGLYGNGILIGNNTYVSAITQSESSLEQDNDQLQDIPNMLRKGMLTVLDFHPGAGKTRVYLPQILKECERLRLKTLVLAPTRVVLSEMKEAMPKMSIKFHTQAFSNTATGKEIIDAMCHATLTHRMLEPTRVTNWEVVIMDEAHFMDPASIAARGWAAHRSRARECATIFMSATPPGTSNEFPESNGTIEDIRKDIPSEPWTKGHEWILEDRRPTAWFLPSIRVANSIANCLRKAERTVVVLNRKTFEKEYPTIKSKRPDFILATDIAEMGANLRVERVIDCRTAYKPFLVDDGTKVMVKGPLKISASSAAQRRGRVGRDPNRDTDTYVYGDSTTEDNGHYVCWTEGSMLLDNMEIKNGMIAPLYGIEGTKTTTVPGETRLRDEQRKVFRELVKRLDMPVWLSWHVAKAGLKVQDRSWCFDGEDDNTLLNDNGEPIFARSPGGGKKPLKPRWVDTRVCSDNSALIDFIKFAEGRR

Kokobera_Aedes_Culex length is 618

AGALWDIPSPREAKPAKVEDGVYRIFSRKLFGESQIGAGVMVKGTFHTMWHVTRGAVLKAGEGLLEPAWADVRKDLICYGGNWKLEEHWDGNEEVQLIALEPGKKVRHIQTKPGIFKTSEGEIGALDLDCMAGTSGSPIVNKNGEVVGLYGNGVLIKGDRYVSAISQKENVGQEDGAEIEDNWFRKRELTVLDLHPGAGKTRRVLPQLVREAVKKRLRTVILAPTRVVASEMYEALRGEPIRYMTPAVQSERTGNEIVDFMCHSTFTMKLFQGVRVPNYNLYIMDEAHFLDPASVAARGYIETRVSMGDAGAIFMTATPPGTTEAFPPSNSPIIDEETRIPDKAWNSGYEWIIEFDGRTVWFVHSIKQGAEIGTCLQKAGKKVLYLNRKTFESEYPKCKSEKWDFVITTDISEMGANFKADRVIDPRKTIKPILLDGRVSMQGPIAITPASAAQRRGRIGRNPEKLGDIYAYSGNVSSDNEGHVSWTEARMLLDNVHVQGGVVAQLYTPEREKTEAYEGEFKLKTNQRKVFSELIRTGDLPVWLAFQVASANVEYHDRKWCFDGPNEHLLLENNQEIEVWTRQGQRRVLKPRWLDGRITSDHLNLKSFKEFASGKRSA

Kedougou_mosquitoes length is 616

SGALWDMPAPRPAPPAVLGDGVYRIMSKKLLGPSQLGVGVMTQGVFHTMWHVTRGCTIEYNGGRLTPYWGSVKEDAISYGGPWKLTEKWDGTSEVQLIAVAPGKGAENVQTKPGIFNTATGTIGAVALDYPAGTSGSPILNSEGKVVGLYGNGVLVGGSSYVSAISQVKHPETQPQVVSDDMFRKKNLTILDLHPGAGKTRRILPEIVKQAIDRRMRTVILAPTRVVAAEMGEALKGLPVRYMTTAVQREHTGTEIVDVMCHATFTMRLIQPMRVPNYNLFVMDEAHFTDPASIAARGYIATKVNMGEAAAIFMTATPPGHADPFPDSNSPVDDMEIMVPDKPWNKGFEWVTDYPGKTVWFVPSVRNGNEIACCLEKAGKKTIQLNRKSFNTEYQRTKNEQWDFVITTDISEMGANFDAERVIDSRKCIKPVILEEGRVNLVGPIPVTASSAAQRRGRIGRNPSRTGDEYRYSGGTVEDDSDHAHWTEAKILLDNVQMPDRIIPQLYGPEQGKVNDVDGIYRLREPEKKVFIELLKRAELPVWLAYQVASAGIQYNDRKWCFDGERNNTILEDSVPAEVWTREGERKVLKPRWMDGRVCSDHAALRAFKEFAAGRR

Modoc_unknown length is 618

SLIVYGGRSESANDETEGDLLEGVYRITAASIFGRKQVGIGVLKEGVFHTMWHVTRGSALRVRGRVLAPEWASVEEDLVAYNGTWKLNKKWDGGEVQLHAYTPDGKIIKTQVLPGKMELAGGGSMGLIPLDFPPGSSGSPVINANGEVVGLYGNGVLHGDTYCSSIAQTTKELPVDKPKVVKGDGWLSKGRISIVDAHPGSGKTHRILPELVRRALSRKLRTLVLAPTRIVVKEMESALRGLDVSFHSSAVSHKNPGSLADVMCHATYVNRKLIHMPQKNYEVIIMDEAHWTDPSSIAARGFITSLCEAKRCAVVLMTATPPGVEDPWAESNEKINDVEKVIPDGPWKQGHEWITEYEGRTAWFVPSLNTANSMAKVLRGLGKKVVVLTSKTFHDNYPKLNDEKPDFILTTDISEMGANLDVERVIDPRTTLKPVEKGNAVEISGERKITPASAAQRRGRVGRTKGKKADYIYQGETNADDSELVCWKEAQMLLDNMESRLGSASVFYGPEQVKMTEMPGFYKLNDEKRKVFRHLLAQCDFTPWLAWKVASDGKSIENREWLKSGPKEHLVTDENCEAVTYKTPGGKVERLQPVWMDARMTREKRDLLNLLDYAEMRR

RioBravo_Bat length is 617

SIIVFGVGEDPEGEKVSIQDGVYRIMVSSLFGKKQVGVGIWTGNSFHTMWHVTRGAVLNICGRKIVPEWASITEDLISYNGGWRLFTRWDGTEVQVHAYTPDGKVTTTQLLPGSMKVEGGLELGLIPLDFPPGSSGSPIISSDGKIIGLYGNGVLHGDTYCSSIAQTEKKEEDVPQPKTLEGDGWLSKGKITVIDAHPGSGKTHRILPNLVKRAAERKMRTLVLAPTRVVIKEMESALKGMDISFHSSAVSTKTPGSLVDVMCHATFVNRKLIHTPQRNYEVIIMDEAHWTDPSSIAARGYITSQCEMKKCAVVLMTATPPGVDDPWANSNEKIADVEKMIPDEPWKQGYEWITDFEGRTAWFVPSYNAAQGISKALRERGKKVLILTSKTFHDNYPKIQSEKPDFILTTDISEMGANLDVDRVIDPRTTLKPMEKGNIVEVSGEISITPASAAQRRGRVGRVKGKKAEYIYQGTTEMDDSDLICWKEAQMLLDNMDSRQRATCQFYEPEQDKMTEIPGYYRLTEEKRKVFRHLLTQCDFTPWLAWNVAQNTKGIEDRGWLNIGPNQHLVSDENGDPIKYTTPSGRERQLQPVWLDNRMVKEKRDLVSLLEYAQMRR

Louping_ill_Tick length is 621

SDLVYSGQGGQERGDRPFEVKDGVYRIFSPGLFWGQRQVGVGYGHKGVLHTMWHVTRGAALSIDDAVAGPYWADVKEDVVCYGGAWSLEEKWKGETVQVHAFPPGRAHEVHQCQPGELILDTGKRLGAIPIDLAKGTSGSPILNAQGVVVGLYGNGPKTNESYVSSIAQGEAEKSRPNLPQAVVGTGWTSKGQITVLDMHPGSGKTHRVLPELIRQCIDRRLSTLVLAPTRVVLKEMERALSGKRVRFHSPAVSDQQAGGAIVDVMCHATYVNRRLLPQGRQNWEVAIMDEAHWTDPHSIAARGHLYTLAKENKCALVLMTATPPGKSEPFPESNGAITSEERQIPDGEWRDGFDWITEYEGRTAWFVPSIAKGGVIARTLRQKGKSVICLNSKTFEKDYSRVREEKPDFVVTTDISEMGANLDVSRVIDGRTNIKPEEVDGKVELTGTRRVTTASAAQRRGRVGRQDGRTDEYIYSGQCDDDDSGLVQWKEAQILLDNITTLRGPVATFYGPEQDKMPEVAGHFRLTEEKRKHFRHLLTHCDFTPWLAWHVAANVSSVTDRSWTWEGPEANAVDEASGDLVTFRSPNGAERTLRPVWRDARMFREGRDIKEFVAYASGRR

Powassan_Tick length is 622

TDLVFSGQLPDQGEKRSFDIKEGVYRIYAPGLFWGYRQIGVGYGTKGVLHTMWHVTRGAALSVEGATSGPYWADVREDVVCYGGAWGLDKKWGGEVVQVHAFPPDSGHKIHQCQPGKLNLEGGRVLGAIPIDLPRGTSGSPIINAQGDVLGLYGNGLKSNDVYISSIAQGNVEKSRPEMPLAVQGGKWTSKGSITVLDMHPGSGKTHRVLPELIRECIDKRLRTVVLAPTRVVLKEMERALQGKRVKFHSAAVDNASSSSGAIVDVMCHATYVNRRLLPQGRQNWEVAIMDEAHWTDPHSIAARGHLYSLAKENRCALVLMTATPPGKSEAFPESKGAIVSEEKPIPEGEWRDGFDWITEFEGRTAWFVPSIAKGGAIARTLRQKGKSVICLNSKTFDKDYGRVHEEKPDFVVTTDISEMGANLDVNRVIDGRTNIKPEEIDGKVELIGTRRVTTASAAQRRGRVGRHEGRTDLYVYSGQCDDDDSSLVQWKEAQILLDNITTVRGPVATFYGPEQGKMLEVAGHFRLTEEKRKHFRHLLTNCDFTPWLAWHVAANTACVTDRKWTWEGPDENAIDGPGGELVTFRSPNGAERKLKPIWKDSRMFREGRDVADFIQYASGRR

Alkhumra_hemorrhagic_fever_Tick length is 571

SELVFSGQGTRTERNRPFEIKDGAYRIYSPGLLWGHRQIGVGYGAKGVLHTMWHVTRGAALVVEEAISGPYWADVREDVVCYGGAWSLESRWRGETVQVHAFPPGRPQETHQCQPGELILENGRKLGAVPIDLSKGTSGSPIINAQGEVVGLYGNGLKTNEAYVSSIAQGEAEKSRPELPLSVQGTGWMSKGQITVLDMHPGSGKTHRVLPELVRQCANRGMRTLVLAPTRVVLKEMEKALAGKKVRFHSPAVEGQSTAGAVVDVMCHATYVHRRLLPQGRQNWEVAIMDEAHWTDPHSIAARGHLYSLAKENRCALVLMTATPPGRGDPFPESNGAIMSEERAIPDGEWREGFDWITDYEGRTAWFVPSISKGGAIARTLRQRGKSVICLNSKTFEKDYLRVREEKPDFVVTTDISEMGANLDVSRVIDGRTNIKPEEVDGKVEMTGTWKEAQILLDNITTLRGPVATFYGPEQMKMPEVAGHYRLNEEKRKHFRHLMWHVATNTSNVLDRSWTWQGPEGNAIDGADGDLVRFKTPGGSERVLQPVWKDCRMFREGRDVKDFILYASGRR

Karshi_Tick length is 621

GELVFSGVGGRTENAPFEVKDGVYRIFRPGLFWGASQIGVGYGSHGVLHTMWHVTRGAAVSINGGAVGPYWADIREDVVCYGGAWNLPTKWEGEVVQLHAFPPGRAHEIKQCQPGRLNLGNGRVMGAIPFDLPKGTSGSPILNAQGVVVGLYGNGLKTKDTYVSGIAQGTPEVSGHEMPLVVQGTGWMSKGEITVIDMHPGAGKTHRVLPELIRQCIDRRLRTLILAPTRVVLREMEQALKGKKVRFHSPAVEAPPNRDAIVDVMCHATYVNRRLSPTGRQNWEVAIMDEGHWTDPHSIAARGHLYTLAKDKKCAFVLMSATPPGQNEPFPESKGPIISEEKAIPDGEWQNGFDWVTEYEGRTAWFVASIARGAAIARTLRAKGKSVICLNSKTFEKDYHRVHEEKPDFVVTTDISEMGANFDVDRVIDGRNNIKPEEIDGAVELIGVRKVTTASAAQRRGRVGRKEGRTDRYVYSGQCDDDDSMLVQWKEAQMLLDNITTLRGMAAVFYGPEQERMPEVAGFYRLTEEKRKYFRHLLSQNDFTPWLAWHVAKNTAGPIDRSWTWEGPEGNAIEGQNGEVVRFRTPGGAEKILKPTWRDMRMFREGRDIREFVQYASCRRG

Montana_myotis_Bat length is 616

SLYLLGLGEGEPESNQRITDGVYRIKVSSLFGKKQVGVGVWSEGSFHTMWHVTRGATLRIGDRKLSPEWANITEDLISYNGGWKLEKKWKGEEVQLHAFTPSNETVVTQLLPGSMKTEKGEELGLIPLDFPPGTSGSPIITSSGHVVGLYGNGIIHGDVYCSSIAQAKEVVKEETVKAVEGDEWLAKGKITVIDAHPGSGKTHKILPSLVRRASERRMRTLVLAPTRVVIKEMENALRGMDVSFHSSAVSSKTPGSLIDVMCHATFVNRKLIHMPQKNYELIIMDEAHWTDPSSIAARGFITTMSENKKCAVVLMTATPPGVQDPWANSNEKIEDVVKVIPEGPWKQGHEWITEFEGRTAWFVPSQNAAQGIAKTLREHGKKVAILTSKTFHDVYPKLKTEKPDFILTTDISEMGANFDVERVIDPRTTLKPIEKGNTVEISGEIQITPASAAQRRGRVGRTPGKMAQYIYQGEVEPDDSELVCWKEGQMLLDNMSVKQSMICTFYGPEQEKMPETPGFFRLSEEKRKVFRHLITQCDFTPWLAWNVASGTKGIEDRDWVNLGPKENLVTDENDDPIVYKSPGGKEHKLAPVWLDTRLTRERKDLAGFIEYAEKRR

Bussuquara_Aedes_Culex_Others length is 620

GALWDIPAPREVKKGSTENGVYRILANRLFGKTQVGVGVMHEGVFHTMWHVTRGAALKSGEGRLDPYWGDVKKDLISYGGPWKLEGRWDGVSEVQLIAVPPKEKAKNVQTTPGVFKTPHGEIGAIVLDFPAGSSGSPIINKLGEVIGLYGNGLMMGDAYASSIAQAEVEDEPDTPNCLPPDVTHKKKLTVLDLHPGAGKTRKVLPKLLQEALEKRLRTVVLAPTRVVAAEMAEALKGMPIRYQTAAVTSSHSGNEIIDLMCHATFTSRLMQPHRVPNYNLYIMDEAHFTDPASIAARGFIATKVSLGEAAAVFMTATPPGSDNPFPASNAPITDTEAQIPDKAWSTGFDWITEYGGKTVWFVPSVRMGNEIAACLTKAKKKVIQLSRRTFNTEYPKCKQGDWDFVVTTDISEMGANFKATRVIDSRRAIKPSIMQDQEERVVLSGPTPISPASAAQRRGRVGRNPNQLGDEYVFSGLTQANDEGNACWTEARMLLDNIHMQNGLIAQLYGPEQDKCFATDGEFKLREKERATFLEFLKADLPVWLSFKAASSGVQYHDRKWCFDGPDNNLVLEDNVPVEIWTKSGERKKLKPRWSDARTYCDHGALTAFKEFAGGRRSVT

Tick_Borne_Encephalitis_Tick length is 620

DLVFSGQGGRERGDRPFEVKDGVYRIFSPGLFWGQSQVGVGYGSKGVLHTMWHVTRGAALSIDDAVAGPYWADVREDVVCYGGAWSLEEKWKGETVQVHAFPPGKAHEVHQCQPGELILDTGRKLGAIPIDLVKGTSGSPILNAQGVVVGLYGNGLKTNETYVSSIAQGEAEKSRPNLPQAVVGTGWTSKGQITVLDMHPGSGKTHRVLPELIRQCIDRRLRTLVLAPTRVVLKEMERALNGKRVRFHSPAVSDQQAGGAIVDVMCHATYVNRRLLPQGRQNWEVAIMDEAHWTDPHSIAARGHLYTLAKENKCALVLMTATPPGKSEPFPESNGAITSEERQIPDGEWRDGFDWITEYEGRTAWFVPSIAKGGAIARTLRQKGKSVICLNSKTFEKDYSRVRDEKPDFVVTTDISEMGANLDVSRVIDGRTNIKPEEVDGKVELTGTRRVTTASAAQRRGRVGRQDGRTDEYIYSGQCDDDDSGLVQWKEAQILLDNITTLRGPVATFYGPEQDKMPEVAGHYRLTEEKRKHFRHLLTHCDFTPWLAWHVAANVSSVTDRSWTWEGPEANAVDEASGDLVTFRSPNGAERTLRPVWKDARMFKEGRDIKEFVAYASGRR

Usutu_Culex length is 619

GGVFWDTPAPRTYPKGDTSPGVYRIMSRYILGTYQAGVGVMYEGVLHTLWHTTRGAAIRSGEGRLTPYWGSVKEDRITYGGPWKFDRKWNGLDDVQLIIVAPGKAAINIQTKPGIFKTPQGEIGAVSLDYPEGTSGSPILDKNGDIVGLYGNGVILGNGSYVSAIVQGEREEEPVPEAYNADMLRKKQLTVLDLHPGAGKTRRILPQIIKDAIQRRLRTAVLAPTRVVAAEMAEALKGLPVRYLTPAVNREHSGTEIVDVMCHATLTHRLMSPLRAPNYNLFVMDEAHFTDPASIAARGYIATKVELGEAAAIFMTATPPGTHDPFPDTNAPVTDIQAEVPDRAWSSGFEWITEYTGKTVWFVASVKMGNEIAQCLQRAGKKVIQLNRKSYDTEYPKCKNGDWDFVITTDISEMGANFGASRVIDCRKSVKPTILEEGEGRVILSNPSPITSASAAQRRGRVGRNPSQIGDEYHYGGGTSEDDTIAAHWTEAKIMLDNIHLPNGLVAQMYGPERDKAFTMDGEYRLRGEERKTFLELLRTADLPVWLAYKVASNGIQYTDRKWCFDGPRSNIILEDNNEVEIVTRTGERKMLKPRWLDARVYADHQSLKWFKDFAAGKR

Dengue2_Aedes length is 618

AGVLWDVPSPPPMGKAELEDGAYRIKQKGILGYSQIGAGVYKEGTFHTMWHVTRGAVLMHKGKRIEPSWADVKKDLISYGGGWKLEGEWKEGEEVQVLALEPGKNPRAVQTKPGLFRTNAGTIGAVSLDFSPGTSGSPIIDKKGKVVGLYGNGVVTRSGAYVSAIAQTEKSIEDNPEIEDDIFRKRRLTIMDLHPGAGKTKRYLPAIVREAIKRGLRTLILAPTRVVAAEMEEALRGLPIRYQTPAIRAEHTGREIVDLMCHATFTMRLLSPVRVPNYNLIIMDEAHFTDPASIAARGYISTRVEMGEAAGIFMTATPPGSRDPFPQSNAPIIDEEREIPERSWNSGHEWVTDFKGKTVWFVPSIKAGNDIAACLRKNGKKVIQLSRKTFDSEYVKTRTNDWDFVVTTDISEMGANFKAERVIDPRRCMKPVILTDGEERVILAGPMPVTHSSAAQRRGRIGRNPKNENDQYIYMGEPLENDEDCAHWKEAKMLLDNINTPEGIIPSMFEPEREKVDAIDGEYRLRGEARKTFVDLMRRGDLPVWLAYKVAAEGINYADRRWCFDGIKNNQILEENVEVEIWTKEGERKKLKPRWLDARIYSDPLALKEFKEFAAGRK


As you can see, the NS3 protein from different flaviviruses are of different lengths and are different in their sequence. The NS3 proteins from different dengue virus serotypes are shown below:

In [3]:
dengue_records = [r for r in records if 'dengue' in r.id.lower()]
for r in dengue_records:
    print r.id, 'length is', len(r)
    print
    print str(r.seq)
    print
Dengue4_Aedes length is 618

SGALWDVPSPAATQKAALSEGVYRIMQRGLFGKTQVGVGIHMEGVFHTMWHVTRGSVICHETGRLEPSWADVRNDMISYGGGWRLGDKWDKEEDVQVLAIEPGKNPKHVQTKPGLFKTLTGEIGAVTLDFKPGTSGSPIINRKGKVIGLYGNGVVTKSGDYVSAITQAERIGEPDYEVDEDIFRKKRLTIMDLHPGAGKTKRILPSIVREALKRRLRTLILAPTRVVAAEMEEALRGLPIRYQTPAVKSEHTGREIVDLMCHATFTTRLLSSTRVPNYNLIVMDEAHFTDPSSVAARGYISTRVEMGEAAAIFMTATPPGATDPFPQSNSPIEDIEREIPERSWNTGFDWITDYQGKTVWFVPSIKAGNDIANCLRKSGKKVIQLSRKTFDTEYPKTKLTDWDFVVTTDISEMGANFRAGRVIDPRRCLKPVILTDGPERVILAGPIPVTPASAAQRRGRIGRNPAQEDDQYVFSGDPLKNDEDHAHWTEAKMLLDNIYTPEGIIPTLFGPEREKTQAIDGEFRLRGEQRKTFVELMRRGDLPVWLSYKVASAGISYKDREWCFTGERNNQILEENMEVEIWTREGEKKKLRPKWLDARVYADPMALKDFKEFASGRK

Dengue3_Aedes length is 619

SGVLWDVPSPPETQKAELEEGVYRIKQQGILGKTQVGVGVQKEGVFHTMWHVTRGAVLTHNGKRLEPNWASVKKDLISYGGGWRLSAQWQKGEEVQVIAVEPGKNPKNFQTMPGTFQTTTGEIGAIALDFKPGTSGSPIINREGKVVGLYGNGVVTKNGGYVSGIAQTNAEPDGPTPELEEEMFKKRNLTIMDLHPGSGKTRKYLPAIVREAIKRRLRTLILAPTRVVAAEMEEALKGLPIRYQTTATKSEHTGREIVDLMCHATFTMRLLSPVRVPNYNLIIMDEAHFTDPASIAARGYISTRVGMGEAAAIFMTATPPGTADAFPQSNAPIQDEERDIPERSWNSGNEWITDFAGKTVWFVPSIKAGNDIANCLRKNGKKVIQLSRKTFDTEYQKTKLNDWDFVVTTDISEMGANFKADRVIDPRRCLKPVILTDGPERVILAGPMPVTAASAAQRRGRVGRNPQKENDQYIFTGQPLNNDEDHAHWTEAKMLLDNINTPEGIIPALFEPEREKSAAIDGEYRLKGESRKTFVELMRRGDLPVWLAHKVASEGIKYTDRKWCFDGQRNNQILEENMDVEIWTKEGEKKKLRPRWLDARTYSDPLALKEFKDFAAGRK

Dengue1_Aedes length is 619

SGVLWDTPSPPEVEKAVLDDGIYRILQRGLLGRSQVGVGVFQEGVFHTMWHVTRGAVLMYQGKRLEPSWASVKKDLISYGGGWRFQGSWNTGEEVQVIAVEPGKNPKNVQTAPGTFKTSEGEVGAIALDFKPGTSGSPIVNREGKIVGLYGNGVVTTSGTYVSAIAQAKASQEGPLPEIEDEVFRKRNLTIMDLHPGSGKTRRYLPAIVREAIKRKLRTLVLAPTRVVASEMAEALKGMPIRYQTTAVKSEHTGKEIVDLMCHATFTMRLLSPVRVPNYNMIIMDEAHFTDPASIAARGYISTRVGMGEAAAIFMTATPPGSVEAFPQSNAVIQDEEKDIPERSWNSGYDWITDFPGKTVWFVPSIKSGNDIANCLRKNGKRVIQLSRKTFDTEYQKTKNNDWDYVVTTDISEMGANFRADRVIDPRRCLKPVILKDGPERVILAGPMPVTVASAAQRRGRIGRNQNKEGDQYIYMGQPLNNDEDHAHWTEAKMLLDNINTPEGIIPALFEPEREKSAAIDGEYRLRGEARKTFVELMRRGDLPVWLSYKVASEGFQYSDRRWCFDGERNNQVLEENMDVEIWTKEGERKKLRPRWLDARTYSDPLALREFKEFAAGRR

Dengue2_Aedes length is 618

AGVLWDVPSPPPMGKAELEDGAYRIKQKGILGYSQIGAGVYKEGTFHTMWHVTRGAVLMHKGKRIEPSWADVKKDLISYGGGWKLEGEWKEGEEVQVLALEPGKNPRAVQTKPGLFRTNAGTIGAVSLDFSPGTSGSPIIDKKGKVVGLYGNGVVTRSGAYVSAIAQTEKSIEDNPEIEDDIFRKRRLTIMDLHPGAGKTKRYLPAIVREAIKRGLRTLILAPTRVVAAEMEEALRGLPIRYQTPAIRAEHTGREIVDLMCHATFTMRLLSPVRVPNYNLIIMDEAHFTDPASIAARGYISTRVEMGEAAGIFMTATPPGSRDPFPQSNAPIIDEEREIPERSWNSGHEWVTDFKGKTVWFVPSIKAGNDIAACLRKNGKKVIQLSRKTFDSEYVKTRTNDWDFVVTTDISEMGANFKAERVIDPRRCMKPVILTDGEERVILAGPMPVTHSSAAQRRGRIGRNPKNENDQYIYMGEPLENDEDCAHWKEAKMLLDNINTPEGIIPSMFEPEREKVDAIDGEYRLRGEARKTFVDLMRRGDLPVWLAYKVAAEGINYADRRWCFDGIKNNQILEENVEVEIWTKEGERKKLKPRWLDARIYSDPLALKEFKEFAAGRK


Sequence alignment: Measuring similarity between protein sequences

The answer to the question of which dengue came first, lies in exactly how similar these proteins (or overall virus polyproteins) are to eachother. Once we know the complete genomic sequence or the sequence of amino acids of one or more proteins, we can use this information to find out how similar one organism is to another. This is done through the bioinformatics technique of sequence alignment. Sequence alignment algorithms like Smith-Waterman, Needleman-Wunsch or BLAST, given two sequences, gives you a scor of how similar the two sequences are and how they align with respect to each other. These algorithms return a score which is indicative of how good the alignment is. Below, we align, using Biopython, the NS3 protein sequence of a dengue virus serotype against the corresponding proteins from West Nile and Hepatitis C virus.

In [4]:
import Bio.pairwise2 as pairwise2
from Bio.SubsMat import MatrixInfo as matlist
from time import time
def algin(s1,s2):
    matrix = matlist.blosum62
    gap_open = -10
    gap_extend = -0.1
    start = time()
    alns = pairwise2.align.localds(s1, s2, matrix, gap_open, gap_extend)
    aln = alns[0]
    print "alignment done in %.2f seconds" %(time()-start)
    aln_s1, aln_s2, score, begin, end = aln
    print 'sequence 1:',s1.id,':',len(s1),'\n'+str(aln_s1.seq)
    print 'sequence 2:',s2.id,':',len(s2),'\n'+str(aln_s2.seq)
    print 'alignment score:',score
    
denv = dengue_records[0]
wnv = [r for r in records if 'west_nile' in r.id.lower()][0]
hepc = [r for r in records if 'hepatitis' in r.id.lower()][0]
print '*'*20
print 'DENV-WNV'
algin(denv,wnv)
print '*'*20
print 'DENV-HEPC'
algin(denv,hepc)
********************
DENV-WNV
alignment done in 48.73 seconds
sequence 1: Dengue4_Aedes : 618 
SGALWDVPSPAATQKAALSEGVYRIMQRGLFGKTQVGVGIHMEGVFHTMWHVTRGSVICHETGRLEPSWADVRNDMISYGGGWRLGDKWDKEEDVQVLAIEPGKNPKHVQTKPGLFKTLTGEIGAVTLDFKPGTSGSPIINRKGKVIGLYGNGVVTKSGDYVSAITQAERIGEP---DYEVDEDIFRKKRLTIMDLHPGAGKTKRILPSIVREALKRRLRTLILAPTRVVAAEMEEALRGLPIRYQTPAVKSEHTGREIVDLMCHATFTTRLLSSTRVPNYNLIVMDEAHFTDPSSVAARGYISTRVEMGEAAAIFMTATPPGATDPFPQSNSPIEDIEREIPERSWNTGFDWITDYQGKTVWFVPSIKAGNDIANCLRKSGKKVIQLSRKTFDTEYPKTKLTDWDFVVTTDISEMGANFRAGRVIDPRRCLKPVILTDGPERVILAGPIPVTPASAAQRRGRIGRNPAQEDDQYVFSGDPLKNDEDHAHWTEAKMLLDNIYTPEGIIPTLFGPEREKTQAIDGEFRLRGEQRKTFVELMRRGDLPVWLSYKVASAGISYKDREWCFTGERNNQILEENMEVEIWTREGEKKKLRPKWLDARVYADPMALKDFKEFASGRK
sequence 2: West_Nile_Culex : 619 
GGVLWDTPSPKEYKKGDTTTGVYRIMTRGLLGSYQAGAGVMVEGVFHTLWHTTKGAALMSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWNGHDEVQMIVVEPGKNVKNVQTKPGVFKTPEGEIGAVTLDYPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYISAIVQGERMEEPAPAGFE--PEMLRKKQITVLDLHPGAGKTRKILPQIIKEAINKRLRTAVLAPTRVVAAEMSEALRGLPIRYQTSAVHREHSGNEIVDVMCHATLTHRLMSPHRVPNYNLFIMDEAHFTDPASIAARGYIATKVELGEAAAIFMTATPPGTSDPFPESNAPISDMQTEIPDRAWNTGYEWITEYVGKTVWFVPSVKMGNEIALCLQRAGKKVIQLNRKSYETEYPKCKNDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIIEEGDGRVILGEPSAITAASAAQRRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLVAQLYQPEREKVYTMDGEYRLRGEERKNFLEFLRTADLPVWLAYKVAAAGISYHDRKWCFDGPRTNTILEDNNEVEVITKLGERKILRPRWADARVYSDHQALKSFKDFASGKR
alignment score: 2180.7
********************
DENV-HEPC
alignment done in 55.22 seconds
sequence 1: Dengue4_Aedes : 618 
SGALWDVPSPAATQKAALSEGVYRIMQRGLFG----------KTQVGVGIHM-------------EGVFHTMWHVTRGSVICHETGRLEPSWADVRNDMISYGGGW--------------------------------RLGDKWDKEEDVQVLAIEPGK----NPKHVQTKPGLFKTLTGEIGAVTLDFKPGTSGSPIINRKGKVIGLYGNGVVT----KSGDYVS--------------------AITQAERIGEPDYEVDEDIFRKKRLTIMDLHP--GAGKTKRILPSIVREALKRRLRTLILAPTRVVAAEM------------EEALR--------GLPIRYQTPAVKSEHTGREIVDLMCHATFTTRLLSSTRVPNYNLIVMDEAHFTDPSSVAARGYISTRVEM-GEAAAIFMTATPPGA-TDPFPQSNSPIEDIER----EIPERSWNTGFDWITDYQGKTVWFVP--SIKAGNDIANCLRKSGKKVIQLSRK---------------------------TFDTEYPKTKLT-DWDFVV--------TTDISEMGANFRAGRVIDPRRCLKPVILTDGPERVILAGPIPVTPASAAQRRGRIGRNPAQEDDQYVFSGDP---------------------------------------------------LKNDEDHAH-WTEAKMLLDNIYTPEGIIPTLFGPEREKTQAIDGEF----RLRGEQRKTFVELMRRGDLPVWLSYKV-----ASAGISYKDREW-CFT----------------GERNNQI-------------LEENMEVEIWTREGEKKKLRPKWLDARVYADPMALKDFKEFASGRK
sequence 2: Hepatitis_C : 631 
-----------------APITAYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTAAQTFLATCINGVCWTVYHGAGTRTIASSKGPVIQMYTNVDQDLV----GWPAPQGARSLTPCTCGSSDLYLVTRHADVIPVRRRGD---------------GRGSLLSPRPIS-------------------YLKGSSGGPLLCPAGHAVGIFRAAVCTRGVAKAVDFIPVEGLETTMRSPVFSDNSSPPAVPQSYQVAH-------------------LHAPTGSGKSTKVPAAYAAQGYK----VLVLNPS--VAATLGFGAYMSKAHGIDPNIRTGVRTITTGSPITYST-------YGKFLADGGCSGGA------------YDIIICDECHSTDATSILGIGTVLDQAETAGARLTVLATATPPGSVTVPHPN----IEEVALSTTGEIP-------------FYGKAI---PLEAIKGGRHLIFCHSK--KKCDELAAKLVALGVNAVAYYRGLDVSVIPASGDVVVVATDALMTGFTGDFDSVIDCNTCVTQTVDFS-----------LDPTFTIETTTL-------------PQDAVSRTQRRGRTGR------------GKPGIYRFVTPGERPSGMFDSSVLCECYDAGCAWYELTPAETTVRLRAYMNTPGLPVCQDHLEFW-------------EGVFTGL--------THIDAHFLSQTKQSGE------------NLPYLVAYQATVCARAQAPPPSWDQMWKCLIRLKPTLHGPTPLLYRLGAVQNEITLTHPITKYIMTCMSADLEVVT-------------------------------------
alignment score: 284.8

As you can see above, the DENV NS3 is more similar to the WNV NS3 than the HEPC NS3.

Phylogenetic trees

We can compare all the virus species using the same way and then, based on the similarities, construct a tree (formally caleld a phylogenetic tree) which shows how each viral species relates to the other: two viruses that are similar should be grouped into the same branch of the tree. Generation of this tree requires multiple sequence (or all pair) pairwise alignments.

The alignment algorithm used above is a deterministic Dynamic programming algorithm and is, consequently, quite slow. Even though Biopython supports the generation and display of phylogenetic trees, we use the much faster MEGA6 package which can not only generate multiple sequence alignments using CLUSTALW but can also generate phylogenetic trees.

The code below, uses the NS3.fasta file to generate a multiple sequence alignments in its own '.meg' format and a tree in the newick file format. It demonstrates how easy it is to use Python as glue.

In [5]:
import os

def getFileParts(fname):
    """
    Returns the parts of a file (path,name,ext)
    """
    (path, name) = os.path.split(fname)
    n=os.path.splitext(name)[0]
    ext=os.path.splitext(name)[1]
    return (path,n,ext)

class Mega:
    """
    Wrapper for the Mega 6 Computational Core program for Phylogenetics 
    You can download the Mega package from: http://www.megasoftware.net/megacc.php    
    Use the Mega 6 analysis settings prototyper (included in the mega package) to generate the settings files
    """
    def __init__(self,megaexe='M6CC.exe'):
        """
        megaexe is the path to the executable
        """
        assert os.path.isfile(megaexe), "Mega executable missing" #file must exist
        self.mexe = megaexe
        
    def align(self,settings,ifile,ofile=None): 
        """
        Generation of alignments using Mega
        input:
            settings: path to settings file for alignment
            ifile: path to input fasta or sequence file
            ofile: path to output file (if not specified, name automatically generated)
        output:
            result: whether the exe was run successfully (0) or not
            ofile: path to the output .meg file
        """
        if ofile is None:
            opath,oname,oext = getFileParts(ifile)
            ofile = os.path.join(opath,oname+'_aln')
        if os.path.isfile(ofile+'.meg'):
            print 'Warning:', 'existing file',ofile+'.meg','will be removed.'
            os.remove(ofile+'.meg')             
        cmd = self.mexe + ' -a '+settings+' -d '+ifile+' -o '+ofile
        result = os.system(cmd)
        ofile+='.meg'
        return result,ofile
    
    def phylo(self,settings,ifile,ofile=None):  
        """
        Generation of phylogenetic tree using Mega
        input:
            settings: path to settings file for tree generation
            ifile: path to input multiple sequence alignment (.meg) file
            ofile: path to output newick file (if not specified, name automatically generated)
        output:
            result: whether the exe was run successfully (0) or not
            ofile: path to the output .nwk file
        """        
        if ofile is None:
            opath,oname,oext = getFileParts(ifile)
            ofile = os.path.join(opath,oname+'_phy')            
        if os.path.isfile(ofile+'.nwk'):
            print 'Warning:', 'existing file',ofile+'.nwk','will be removed.'
            os.remove(ofile+'.nwk')
        cmd = self.mexe + ' -a '+settings+' -d '+ifile+' -o '+ofile
        result = os.system(cmd)
        ofile+='.nwk'
        return result,ofile
        
mega = Mega()
aresult,afile = mega.align(settings = 'aln_clustal_blosum.mao',ifile='NS3.fasta') #settings: using clustal default with blosum matrix
presult,tfile = mega.phylo(settings = 'tree_ml.mao',ifile=afile) #settings: using Maximum Likelihood generator of phylo tree
print 'Output alignment file generated:',afile
print 'Output phylogenetic tree file generated:',tfile
Warning: existing file NS3_aln.meg will be removed.
Warning: existing file NS3_aln_phy.nwk will be removed.
Output alignment file generated: NS3_aln.meg
Output phylogenetic tree file generated: NS3_aln_phy.nwk

Once the files have been generated, we can use Biopython to read those files and display the tree.

In [6]:
from Bio import AlignIO
from Bio import Phylo
import matplotlib.pyplot as plt

def meg2fasta(ifile,ofile = None):
    """
    Coverts mega format files to fasta files
    """
    if ofile is None:
        path,name,ext = getFileParts(ifile)
        ofile = os.path.join(path,name+'.fasta')
    with open(ifile,'r') as ifx, open(ofile,'w') as ofx:
        for i in ifx:
            if not i.strip() or '#mega' in i or i[0]=='!':
                continue            
            if i[0]=='#':            
                ofx.write('>'+i[1:])
            else:
                ofx.write(i)
    return ofile

align = AlignIO.read(meg2fasta(afile), "fasta")
tree = Phylo.read(tfile, "newick")
print align #print the alignment

tree.ladderize()   # Flip branches so deeper clades are displayed at top
Phylo.draw(tree,do_show=False)
ax = plt.gca().axis()
plt.axis((0.9,7)+ax[2:])
plt.show()
SingleLetterAlphabet() alignment with 26 rows and 643 columns
-GGVLWDTPSPKEYK-KGDTTTGVYRIMTRGLLG-SYQAGAGVM...--- West_Nile_Culex
--GVFWDTPSPKPCS-KGDTTTGVYRIMARGILG-TYQAGVGVM...R-- Japanese_encephalitis_Culex
-SGAIWDVPAPKEKK-RAELSTGVFRIMARGILG-KYQAGVGVM...--- Bagaza_Aedes_Culex_Others
SNTSFSLITDILESDDHESPNEGIYRIFASGVLG-KKQVGVGIW...--- Apoi_Rodents
-SGALWDVPSPAATQ-KAALSEGVYRIMQRGLFG-KTQVGVGIH...--- Dengue4_Aedes
--GVMWDVPAPKQFG-KTELKPGVYRVMTMGILG-RYQSGVGVM...--- Ilheus_Aedes_Culex_Others
-SGVLWDVPSPPETQ-KAELEEGVYRIKQQGILG-KTQVGVGVQ...--- Dengue3_Aedes
APITAYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTAAQTFL...--- Hepatitis_C
-SGVLWDTPSPPEVE-KAVLDDGIYRILQRGLLG-RSQVGVGVF...--- Dengue1_Aedes
-GDVLWDIPTPKIIEECEHLEDGIYGIFQSTFLG-ASQRGVGVA...--- Yellow_Fever_Aedes
SDLVFSGQSGSERGSQPFEVRDGVYRILSPGLLWGHRQVGVGFG...--- Omsk_hemorrhagic_fever_Tick
-SGALWDVPAPKEVK-KGETTDGVYRVMTRRLLG-STQVGVGVM...--- Zika_Mosquitoes_NonVector(HH)
-SGVLWDIPTVVPPEEVGYLEDGVYTINQKNVLG-MAQKGVGVV...--- Sepik_Aedes
-AGALWDIPSPREAK-PAKVEDGVYRIFSRKLFG-ESQIGAGVM...SA- Kokobera_Aedes_Culex
-SGALWDMPAPRPAP-PAVLGDGVYRIMSKKLLG-PSQLGVGVM...--- Kedougou_mosquitoes
SLIVYGGRSESANDETEGDLLEGVYRITAASIFG-RKQVGIGVL...--- Modoc_unknown
--SIIVFGVGEDPEGEKVSIQDGVYRIMVSSLFG-KKQVGVGIW...--- RioBravo_Bat
SDLVYSGQGGQERGDRPFEVKDGVYRIFSPGLFWGQRQVGVGYG...--- Louping_ill_Tick
...
-AGVLWDVPSPPPMG-KAELEDGAYRIKQKGILG-YSQIGAGVY...--- Dengue2_Aedes

Discussion

The y-axis of the plot showing the phylogenetic tree is meaningless but the x-axis is derived from the degrees of similarities between the NS3 protein from different viruses and can be interpreted as a measure of number of genetic changes or mutations in this protein over time. A split in the tree shows when two known viruses species diverged through evolution. This indicates that, out of the 4 dengue species, Dengue-4 arose before other versions of the same virus.

This plot shows how different viruses are related to one another. Even though we haven't used information about the vectors of these viruses in our analysis, the viruses are (almost) neatly clustered based on their vectors: Mosquito (Culex or Aedes) borne viruses are clustered separately from the tick borne viruses which form their own cluster. Note that the mosquito viruses (in Aedes or Culex) arose independently of flaviviruses in ticks, bats and rodents (the split at branch length ~3). It also shows that the Hepatitis virus arose independently of other types of flaviviruses considered here.

This is a simple sample of what Bioinformatics is all about -- considering biological systems as information process and using computers to solve biological problems. It also shows how python can be used for Bioinformatics.

For more details on the phylogenetic analysis of Flaviviruses, see this paper.

Similar phylogenetic analysis at a larger scale has shwon that all life is linked into a tree of life. You can get an interactive version of the tree of life here.