This workflow offers several functionalities to explore the consequence of protein mutations. It reports features that overlap the mutations, or that are in close physical proximity.
The features reported include protein domains, variants, helices, ligand binding residues, catalytic sites, transmembrane domains, InterPro domains, and known somatic mutations in different types of cancer. This information is extracted from resources such as UniProt, COSMIC, InterPro and Appris. It can also identify mutations affecting the interfaces of protein complexes.
This workflow makes use of PDB files to calculate residues in close proximity. This information is used to find features close to the mutations, at a distance of 5 angstroms, or mutations in residues close to residues in a complex partner, at a distance of up to 8 angstroms.
PDBs are extracted from Interactome3d, which organized thousands of PDBs, for both experimental structures and structure models, of individual proteins and protein complexes.
Pairwise (Smith-Waterman) alignment is used to fix all inconsistencies between protein sequences in PDBs, Uniprot and Ensembl Protein ID.
Reference:
Vazquez M, Valencia A, Pons T. (2015) Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces. Bioinformatics (2015); 31(14):2397-2399 (doi: 10.1093/bioinformatics/btv142)
Wizard
Use the following textbox to input your mutations and retrieve all annotations, including neighbours and interfaces. This method is limited to 1000 variants, use the other (more granular) tasks if your mutation set is larger. Mutations can be specified as genomic mutation 18:6237978:G
, a mutated isoform ENSP00000382976:L257R
, or using any identifier instead of the Ensembl Protein ID
such as Associated Gene Name
or gene symbol KRAS:G12V
.
If genomic mutations are given, only principal isoforms are considered. If the protein is specified with any id other than Ensembl Protein ID
, it will be translated to Ensembl Gene ID
and then its principal isoform will be extracted from Appris. For instance, if the mutation is given using UniProt/SwissProt Accession
, and the change is relative to the sequence reported in UniProt, inconsistencies may appear from wrong isoform mappings or due to discrepancies in the sequence. No attempt is made to fix such inconsistencies in this wizard.
The organism is assumed to be Hsa/feb2014
. If genomic mutations are introduced, they are assumed to be relative to the watson or forward strand.
Scores
While Structure-PPi itself is not intended to be an stand-alone damage predictor, we provide a score, the Structure-PPi feature score
, that quantifies the protein features that are overlapping or close to each mutation. The score is built by adding individual scores for the different features. The individual score that each feature contributes has been selected based on expert opinion and guided by empirical results on the COSMIC
and 1000 Genomes
data. The scoring scheme is as follows:
Appris features: we add 2 if at least one ligand binding or catalytic site annotated in
firestar
is affected; if none of the affected features meets this condition we add only 1COSMIC mutations: 3 if more that ten COSMIC samples have mutations overlapping the residue, 2 if its more that five, and 1 if its more than one sample. We add nothing if just one sample is found
UniProt variants: 1 if the position has at least one variant annotated. If at least one of these variants is also annotated as
Disease
we add 2 more. If none is classified asDisease
but at least one is annotated asUnclassified
we add 1 more. If all are annotated asPolymorphism
we add nothing more.UniProt features: We add 1 if any of the following features are affected
MUTAGEN, DISULFID, DNA_BIND, METAL, INTRAMEM, CROSSLNK
. These features show a frequency that is more than double in COSMIC with respect to 1000 Genomes. MUTAGEN entries are only considered if the description field does not include the text ‘No effect’Affected interfaces: We add 2 if any protein-protein interaction surface is affected
These scores are calculated for the direct hits and for the neighbour hits (with the exception of affected interfaces, where it doesn’t apply). Scores for neighbours are divided by 2. The final tally is reported under the section Damage predictions
in the wizard report
Precomputed results
The following files contain reports for all mutations in the COSMIC and 1000 Genomes databases. The where produced using the Structure-PPI and Sequence workflows. Due to the large size of these datasets, we have skipped annotation with the `COSMIC` database itself, which would have resulted in massive result files.
- COSMIC:all - genomic_mutation_annotations/consequence
- COSMIC:all - genomic_mutation_annotations/mutation_genes
- COSMIC:all - genomic_mutation_annotations/mutation_mi_annotations
- COSMIC:all - mutated_isoform_annotations/Appris
- COSMIC:all - mutated_isoform_annotations/InterPro
- COSMIC:all - mutated_isoform_annotations/UniProt
- COSMIC:all - mutated_isoform_annotations/db_NSFP
- COSMIC:all - mutated_isoform_annotations/interfaces
- COSMIC:all - mutated_isoform_annotations/variants
- COSMIC:all - mutated_isoform_neighbour_annotations/Appris
- COSMIC:all - mutated_isoform_neighbour_annotations/InterPro
- COSMIC:all - mutated_isoform_neighbour_annotations/UniProt
- COSMIC:all - mutated_isoform_neighbour_annotations/variants
- Genomes1000:all - genomic_mutation_annotations/consequence
- Genomes1000:all - genomic_mutation_annotations/mutation_genes
- Genomes1000:all - genomic_mutation_annotations/mutation_mi_annotations
- Genomes1000:all - mutated_isoform_annotations/Appris
- Genomes1000:all - mutated_isoform_annotations/InterPro
- Genomes1000:all - mutated_isoform_annotations/UniProt
- Genomes1000:all - mutated_isoform_annotations/db_NSFP
- Genomes1000:all - mutated_isoform_annotations/interfaces
- Genomes1000:all - mutated_isoform_annotations/variants
- Genomes1000:all - mutated_isoform_neighbour_annotations/Appris
- Genomes1000:all - mutated_isoform_neighbour_annotations/InterPro
- Genomes1000:all - mutated_isoform_neighbour_annotations/UniProt
- Genomes1000:all - mutated_isoform_neighbour_annotations/variants
Tasks
- score_summary
-
Run the entire complement of analyses over a set of (genomic or protein) variants and produce a report with scores to highlight the most relevant. Limited to 1000 variants.
- annotate_mi
-
Annotates protein mutations based on the protein features that are overlapping amino-acid changes
- annotate_dna
-
Annotates genomic mutations based on the protein features that are overlapping amino-acid changes
- annotate_mi_neighbours
-
Annotates protein mutations based on the protein features that are in close physical proximity to amino-acid changes
- annotate_dna_neighbours
-
Annotates genomic mutations based on the protein features that are in close physical proximity to amino-acid changes
- mi_interfaces
-
Find protein mutations with affected residues in protein-protein interaction surfaces
- dna_interfaces
-
Find genomic mutations that affect residues in protein-protein interaction surfaces
- mi_neighbours
-
Finds residues physical proximity to amino-acid changes in protein mutations
- dna_neighbours
-
Finds residues physical proximity to amino-acid changes derived from genomic mutations
- wizard
-
Run a list of variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise). Variants can be expressed as genomic mutations or protein mutations. When protein mutations are used, the name of the protein can be `Ensembl Protein ID` or any other protein or gene identifier, including gene symbols (e.g. KRAS:G12V)
- mi_wizard
-
Run a list of protein variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise). The name of the protein can be `Ensembl Protein ID` or any other protein or gene identifier, including gene symbols (e.g. KRAS:G12V)
- dna_wizard
-
Run a list of genomic variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise).
- scores
-
Score a list of variants based on the report generated by the `wizard`. The limitation to 1000 variants still holds. Used by scores_summary.
- neighbour_map
-
For a given PDB, find all pairs of residues in a PDB that fall within a given 'distance' of each other.
- neighbours_in_pdb
-
Use a PDB to find the residues neighbouring, in three dimensional space, a particular residue in a given sequence.
- pdb_alignment_map
-
Find the correspondence between sequence positions in a PDB and in a given sequence. PDB positions are reported as `chain:position`.
- pdb_chain_position_in_sequence
-
Translate the positions of amino-acids in a particular chain of the provided PDB into positions inside a given sequence.
- sequence_position_in_pdb
-
Translate the positions inside a given amino-acid sequence to positions in the sequence of a PDB by aligning them