CanProVar provides the download of human protein database (Ensembl v79) in the fasta format, in which variation information is recorded in the header line of each sequence.
The README file explains the contents of the following files: |
Total |
  | Description | Protein(FASTA) | Statistics |
---|---|---|---|
Validated dbSNP_nsSNPs | variation information from validated coding SNPs | dbSNP_validated_nsSNP_protein | 967,017 |
Cancer related_nsSNPs | mutations that have been reported in cancer samples | cancer_nsSNP_protein | 156,671 |
Both | nsSNPs from both the dbSNP_validated* file and the cancer_* file | all_nsSNP_protein | 1123,688 |
The csSNPs of each cancer type |
MS-CanProVar (version 2.0) is a protein sequence database that includes variation information to facilitate peptide variant detection in shotgun proteomics. In the .fasta file, each variant peptide is included as an independent entry; variations are annotated in the header line; variations are labeled as "rs" for SNPs and "cs" for cancer-related mutations. Please refer to A bioinformatics workflow for variant peptide detection in shotgun proteomics. Li et al., MCP, 2011 for details about the MS-CanProVar database. The current version of MS-CanProVar is based on Ensembl V79.
|
Dr. Jing Li's Group
©2015 Menghuan Zhang, Jing Li