Database CanProVar provides three protein databases(*_protein) in the FASTA foramt for download. The *_protein files contain only the original (non-mutation) full length protein seqences, and all the mutations/SNPs within a sequence are shown in the header line. For example: >ENSP00000348429 ENST00000356116 rs3736946:M238V;cs3459:K444R;cs2435:G466D;rs12254915:T542A MDALKPPCLWRNHERGKKDRDSCGRKNSEPGSPHSLEALRDAAPSQGLNFLLLFTKMLFI FNFLFSPLPTPALICILTFGAAIFLWLITRPQPVLPLLDLNNQSVGIEGGARKGVSQKNN DLTSCCFSDAKTMYEVFQRGLAVSDNGPCLGYRKPNQPYRWLSYKQVSDRAEYLGSCLLH KGYKSSPDQFVGIFAQNRPEWIISELACYTYSMVAVPLYDTLGPEAIVHIVNKADIAMVI CDTPQKALVLIGNVEKGFTPSLKVIILMDPFDDDLKQRGEKSGIEILSLYDAENLGKEHF RKPVPPSPEDLSVICFTSGTTGDPKGAMITHQNIVSNAAAFLKCVEHAYEPTPDDVAISY LPLAHMFERIVQAVVYSCGARVGFFQGDIRLLADDMKTLKPTLFPAVPRLLNRIYDKVQN EAKTPLKKFLLKLAVSSKFKELQKGIIRHDSFWDKLIFAKIQDSLGGRVRVIVTGAAPMS TSVMTFFRAAMGCQVYEAYGQTECTGGCTFTLPGDWTSGHVGVPLACNYVKLEDVADMNY FTVNNEGEVCIKGTNVFKGYLKDPEKTQEALDSDGWLHTGDIGRWLPNGTLKIIDRKKNI FKLAQGEYIAPEKIENIYNRSQPVLQIFVHGESLRSSLVGVVVPDTDVLPSFAAKLGVKG SFEELCQNQVVREAILEDLQKIGKESGLKTFEQVKAIFLHPEPFSIENGLLTPTLKAKRG ELSKYFRTQIDSLYEHIQD Here rs****** is SNPs ID of NCBI dbSNP database, and cs****** is cancer related variation ID of CanProVar. dbSNP_validated* includes variation information from validated coding SNPs of the dbSNP database of NCBI. cancer_* includes mutations that have been reported in cancer samples. all_* file includes nsSNPs from both the dbSNP_validated* file and the cancer_* file after removing the redundancy.