Download CanProVar Data
CanProVar provides the download of human protein database (Ensembl v79) in the fasta format, in which variation information is recorded in the header line of each sequence.

The README file explains the contents of the following files:

  Description   Protein(FASTA) Statistics
Validated dbSNP_nsSNPs variation information from validated coding SNPs dbSNP_validated_nsSNP_protein 967,017
Cancer related_nsSNPs mutations that have been reported in cancer samples cancer_nsSNP_protein 156,671
Both nsSNPs from both the dbSNP_validated* file and the cancer_* file all_nsSNP_protein 1123,688

The csSNPs of each cancer type
Cancer Name Statistics Protein(FASTA)
Adrenal Gland Neoplasms 15
Biliary Tract Cancer 570
Bone Neoplasms 415
Brain Cancer 327
Breast Cancer 17460
Central Nervous System Neoplasms 5905
Colorectal Cancer 1127
Esophageal Cancer 2246
Gastric Cancer 6693
Head and Neck Cancer 9450
Hepatocellular Carcinoma 5143
Intestines Cancer 18090
Leukemia 8134
Lung Cancer 14382
Lymphoma 1110
Melanoma 11659
Myeloproliferative Disorders 53
Neoplasms by Histologic Type 1491
Non-small cell lung carcinoma 60
Oral Cancer 3
Ovarian Cancer 17303
Pancreatic Cancer 6353
Parathyroid Carcinoma 240
Pituitary Carcinoma 140
Prostate Cancer 1804
Renal Cancer 4183
Sarcoma 31
Skin Cancer 6137
Small cell lung carcinoma 34
Testicular Cancer 91
Thyroid Carcinoma 1804
Urinary Bladder Cancer 3718
Uterine Cancer 8594
Vulva Cancer 172
acute lymphocytic leukemia 6
acute myeloid leukemia 8
breast ductal carcinoma 23
chronic lymphocytic leukemia 198
chronic myeloid leukemia 21
follicular thyroid carcinoma 4
pancreatic ductal adenocarcinoma 11

Download MS-CanProVar Data
MS-CanProVar (version 2.0) is a protein sequence database that includes variation information to facilitate peptide variant detection in shotgun proteomics. In the .fasta file, each variant peptide is included as an independent entry; variations are annotated in the header line; variations are labeled as "rs" for SNPs and "cs" for cancer-related mutations. Please refer to A bioinformatics workflow for variant peptide detection in shotgun proteomics. Li et al., MCP, 2011 for details about the MS-CanProVar database. The current version of MS-CanProVar is based on Ensembl V79.

Dr. Jing Li's Group

©2015 Menghuan Zhang, Jing Li