This workflow offers several functionalities to explore the consequence of protein mutations. It reports features that overlap the mutations, or that are in close physical proximity.
The features reported include protein domains, variants, helices, ligand binding residues, catalytic sites, transmembrane domains, InterPro domains, and known somatic mutations in different types of cancer. This information is extracted from resources such as UniProt, COSMIC, InterPro and Appris. It can also identify mutations affecting the interfaces of protein complexes.
This workflow makes use of PDB files to calculate residues in close proximity. This information is used to find features close to the mutations, at a distance of 5 angstroms, or mutations in residues close to residues in a complex partner, at a distance of up to 8 angstroms.
PDBs are extracted from Interactome3d, which organized thousands of PDBs, for both experimental structures and structure models, of individual proteins and protein complexes.
Pairwise (Smith-Waterman) alignment is used to fix all inconsistencies between protein sequences in PDBs, Uniprot and Ensembl Protein ID.
Vazquez M, Valencia A, Pons T. (2015) Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces. Bioinformatics (2015); 31(14):2397-2399 (doi: 10.1093/bioinformatics/btv142)
Use the following textbox to input your mutations and retrieve all
annotations, including neighbours and interfaces. This method is limited
to 1000 variants, use the other (more granular) tasks if your mutation set
is larger. Mutations can be specified as genomic mutation
ENSP00000382976:L257R, or using any identifier instead of
Ensembl Protein ID such as
Associated Gene Name or gene symbol
If genomic mutations are given, only principal isoforms are considered. If
the protein is specified with any id other than
Ensembl Protein ID, it
will be translated to
Ensembl Gene ID and then its principal isoform will
be extracted from Appris. For instance, if the mutation is given using
UniProt/SwissProt Accession, and the change is relative to the sequence
reported in UniProt, inconsistencies may appear from wrong isoform mappings
or due to discrepancies in the sequence. No attempt is made to fix such
inconsistencies in this wizard.
The organism is assumed to be
Hsa/feb2014. If genomic
mutations are introduced, they are assumed to be relative to the watson or
While Structure-PPi itself is not intended to be an stand-alone damage
predictor, we provide a score, the
Structure-PPi feature score, that
quantifies the protein features that are overlapping or close to each
mutation. The score is built by adding individual scores for the
different features. The individual score that each feature contributes
has been selected based on expert opinion and guided by empirical results on
1000 Genomes data. The scoring scheme is as follows:
Appris features: we add 2 if at least one ligand binding or catalytic site annotated in
firestaris affected; if none of the affected features meets this condition we add only 1
COSMIC mutations: 3 if more that ten COSMIC samples have mutations overlapping the residue, 2 if its more that five, and 1 if its more than one sample. We add nothing if just one sample is found
UniProt variants: 1 if the position has at least one variant annotated. If at least one of these variants is also annotated as
Diseasewe add 2 more. If none is classified as
Diseasebut at least one is annotated as
Unclassifiedwe add 1 more. If all are annotated as
Polymorphismwe add nothing more.
UniProt features: We add 1 if any of the following features are affected
MUTAGEN, DISULFID, DNA_BIND, METAL, INTRAMEM, CROSSLNK. These features show a frequency that is more than double in COSMIC with respect to 1000 Genomes. MUTAGEN entries are only considered if the description field does not include the text 'No effect'
Affected interfaces: We add 2 if any protein-protein interaction surface is affected
These scores are calculated for the direct hits and for the neighbour
hits (with the exception of affected interfaces, where it doesn't apply).
Scores for neighbours are divided by 2. The final tally is reported
under the section
Damage predictions in the wizard report
The following files contain reports for all mutations in the COSMIC and 1000 Genomes databases. The where produced using the Structure-PPI and Sequence workflows. Due to the large size of these datasets, we have skipped annotation with the `COSMIC` database itself, which would have resulted in massive result files.
- COSMIC:all - genomic_mutation_annotations/consequence
- COSMIC:all - genomic_mutation_annotations/mutation_genes
- COSMIC:all - genomic_mutation_annotations/mutation_mi_annotations
- COSMIC:all - mutated_isoform_annotations/Appris
- COSMIC:all - mutated_isoform_annotations/InterPro
- COSMIC:all - mutated_isoform_annotations/UniProt
- COSMIC:all - mutated_isoform_annotations/db_NSFP
- COSMIC:all - mutated_isoform_annotations/interfaces
- COSMIC:all - mutated_isoform_annotations/variants
- COSMIC:all - mutated_isoform_neighbour_annotations/Appris
- COSMIC:all - mutated_isoform_neighbour_annotations/InterPro
- COSMIC:all - mutated_isoform_neighbour_annotations/UniProt
- COSMIC:all - mutated_isoform_neighbour_annotations/variants
- Genomes1000:all - genomic_mutation_annotations/consequence
- Genomes1000:all - genomic_mutation_annotations/mutation_genes
- Genomes1000:all - genomic_mutation_annotations/mutation_mi_annotations
- Genomes1000:all - mutated_isoform_annotations/Appris
- Genomes1000:all - mutated_isoform_annotations/InterPro
- Genomes1000:all - mutated_isoform_annotations/UniProt
- Genomes1000:all - mutated_isoform_annotations/db_NSFP
- Genomes1000:all - mutated_isoform_annotations/interfaces
- Genomes1000:all - mutated_isoform_annotations/variants
- Genomes1000:all - mutated_isoform_neighbour_annotations/Appris
- Genomes1000:all - mutated_isoform_neighbour_annotations/InterPro
- Genomes1000:all - mutated_isoform_neighbour_annotations/UniProt
- Genomes1000:all - mutated_isoform_neighbour_annotations/variants
Annotates genomic mutations based on the protein features that are overlapping amino-acid changes
Annotates mutated isoforms based on the protein features that are overlapping amino-acid changes
Annotates mutated isoforms based on the protein features that are in close physical proximity to amino-acid changes
Annotates genomic mutations based on the protein features that are in close physical proximity to amino-acid changes
Find variants that affect residues in protein-protein interaction surfaces
Find mutated_isoforms with affected residues in protein-protein interaction sufaces
Finds residues physical proximity to amino-acid changes in mutated isoforms
For a given PDB, find all pairs of residues in a PDB that fall within a given 'distance' of each other. It uses PDBs from Interactome3d for individual proteins.
Use a pdb to find the residues neighbouring, in three dimensional space, a particular residue in a given sequence.
Find the correspondence between sequence positions in a PDB and in a given sequence. PDB positions are reported as `chain:position`.
Translate the positions of amino-acids in a particular chain of the provided PDB into positions inside a given sequence.
Produce a small table summarizing the mutation scores and a few of the features
Score a list of variants based on the report generated by the `wizard`. The limitation to 1000 variants still holds.
Translate the positions inside a given amino-acid sequence to positions in the sequence of a PDB by aligning them
Run a list of variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise). Variants can be expressed as genomic mutations or protein mutations. When protein mutations are used, the name of the protein can be `Ensembl Protein ID` or any other protein or gene identifier, including gene symbols (e.g. KRAS:G12V)