Workflows
The following workflows are installed in this system:
-
Proteomics
This workflow offers several functionalities to explore the consequence of protein mutations. It reports features that overlap the mutations, or that are in close physical proximity.
The features reported include protein domains, variants, helices, ligand binding residues, catalytic sites, transmembrane domains, InterPro domains, and known somatic mutations in different types of cancer. This information is extracted from resources such as UniProt, COSMIC, InterPro and Appris. It can also identify mutations affecting the interfaces of protein complexes.
This workflow makes use of PDB files to calculate residues in close proximity. This information is used to find features close to the mutations, at a distance of 5 angstroms, or mutations in residues close to residues in a complex partner, at a distance of up to 8 angstroms.
PDBs are extracted from Interactome3d, which organized thousands of PDBs, for both experimental structures and structure models, of individual proteins and protein complexes.
Pairwise (Smith-Waterman) alignment is used to fix all inconsistencies between protein sequences in PDBs, Uniprot and Ensembl Protein ID.
Reference:
Vazquez M, Valencia A, Pons T. (2015) Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces. Bioinformatics (2015); 31(14):2397-2399 (doi: 10.1093/bioinformatics/btv142)
Proteomics exported tasks Task Description Examples sequence_position_in_pdb Translate the positions inside a given amino-acid sequence to positions in the sequence of a PDB by aligning them 1 pdb_chain_position_in_sequence Translate the positions of amino-acids in a particular chain of the provided PDB into positions inside a given sequence. 1 pdb_alignment_map Find the correspondence between sequence positions in a PDB and in a given sequence. PDB positions are reported as `chain:position`. 2 neighbour_map For a given PDB, find all pairs of residues in a PDB that fall within a given 'distance' of each other. 1 neighbours_in_pdb Use a PDB to find the residues neighbouring, in three dimensional space, a particular residue in a given sequence. 1 mi_neighbours Finds residues physical proximity to amino-acid changes in protein mutations 3 mi_interfaces Find protein mutations with affected residues in protein-protein interaction surfaces 2 annotate_mi Annotates protein mutations based on the protein features that are overlapping amino-acid changes 2 annotate_mi_neighbours Annotates protein mutations based on the protein features that are in close physical proximity to amino-acid changes 2 annotate_dna Annotates genomic mutations based on the protein features that are overlapping amino-acid changes 2 dna_neighbours Finds residues physical proximity to amino-acid changes derived from genomic mutations No annotate_dna_neighbours Annotates genomic mutations based on the protein features that are in close physical proximity to amino-acid changes No dna_interfaces Find genomic mutations that affect residues in protein-protein interaction surfaces 2 mi_wizard Run a list of protein variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise). The name of the protein can be `Ensembl Protein ID` or any other protein or gene identifier, including gene symbols (e.g. KRAS:G12V) No dna_wizard Run a list of genomic variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise). No wizard Run a list of variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise). Variants can be expressed as genomic mutations or protein mutations. When protein mutations are used, the name of the protein can be `Ensembl Protein ID` or any other protein or gene identifier, including gene symbols (e.g. KRAS:G12V) 1 scores Score a list of variants based on the report generated by the `wizard`. The limitation to 1000 variants still holds. Used by scores_summary. 1 score_summary Run the entire complement of analyses over a set of (genomic or protein) variants and produce a report with scores to highlight the most relevant. Limited to 1000 variants. 1 -
Genomics
Genomics exported tasks Task Description Examples names No documentation 1 -
Translation
Genes and proteins may be referenced using a variety of identifier formats: Ensembl, Entrez, UniProt, RefSeq, Affy probes, etc. Translating between these names can be time consuming and error prone.
This workflow uses identifier translation files downloaded from Ensembl BioMart to translate gene and protein identifiers between formats. The files are downloaded separatedly for each organism and build, to account for changes overtime that could introduce inconsistencies.
Translation exported tasks Task Description Examples formats Output available identifier formats for a given organism No translate Translate gene ids to a particular format 1 translate_from Translate gene ids to a particular format given in another format 1 tsv_translate Translate gene ids to a particular format. Return TSV 1 tsv_translate_from Translate gene ids to a particular format given in another format. Return TSV No translate_protein Translate protein ids to a particular format No translate_protein_from Translate protein ids to a particular format given in another format No tsv_translate_protein Translate protein ids to a particular format. Return TSV No tsv_translate_protein_from Translate protein ids to a particular format given in another format. Return TSV No translate_probe Translate probe ids to a particular format No translate_probe_from Translate probe ids to a particular format given in another format No tsv_translate_probe Translate probe ids to a particular format. Return TSV No tsv_translate_probe_from Translate probe ids to a particular format given in another format. Return TSV No transcript_to_protein Translate transcript to their corresponding protein ids No tsv_translate_multiple Translate gene ids to a particular format given in another format. Return TSV with multiple results 1 -
Sequence
Finds genomic features overlapping genomic positions, like exons, reconstructs offsets into transcripts, and computes the amino-acid changaes of variants. Additionally finds mutations in exon junctions, and genes with high frequencies of mutations.
Sequence exported tasks Task Description Examples add_reference Add reference to mutations as (ref\>mut) No type Report the type of base change: transition, transversion, indel, unknown or none at all 5 splicing_mutations Find mutations that may affect the splicing of protein coding transcripts 4 affected_genes Finds genes affeted by genomic mutations, either by amino-acid changes on their protein products, or by changes in splicing sequences 6 sequence_ontology No documentation No binomial_significance_syn For a list of mutations, find genes that suffer a higher rate of mutation than expected. Considers also synonymous mutations from the data. Considers only exon bases No reference Report the reference base at the provided positions 3 gene_strand_reference Report the reference base at the provided positions on the gene coding strand that position. In case of overlap the forward or watson strand is used. No to_watson No documentation No is_watson Guess wether the mutations provided are given in the watson strand or the gene strand No genes Report genes overlapping positions No exons Report exons overlapping positions No transcripts Report transcripts overlapping positions No exon_junctions Report exon junctions overlapping positions 5 genes_at_ranges Report genes overlapping ranges No transcript_offsets Computes the offset inside the coding sequence of the transcripts overlapped the genomic mutations that overlap them. No mutated_isoforms Computes the consequence of genomic mutations in terms of amino-acid changes in protein isoforms case, only consequences in princial isoforms will be reported (as defined by Appris) 5 TSS No documentation No TSS_in_range No documentation No mutated_isoforms_fast One-step implementation of the `mutated_isoforms` task case, only consequences in princial isoforms will be reported (as defined by Appris) 7 expanded_vcf Expands the `INFO` and `FORMAT/Sample` fields of VCF files in to a standard TSV format No genomic_mutations Extract genomic mutations from a VCF file that match a quality criteria No -
Enrichment
Functional enrichment analysis of gene lists using the hypergeometric distribution.
Enrichment exported tasks Task Description Examples rank_enrichment No documentation 1 enrichment No documentation 1 -
MutationEnrichment
Enrichment analysis based on mutation frequencies
MutationEnrichment exported tasks Task Description Examples mutation_pathway_enrichment No documentation 1 sample_pathway_enrichment No documentation 1 gene_count_enrichment No documentation No -
TSVWorkflow
Utilities for TSV files
TSVWorkflow exported tasks Task Description Examples change_key No documentation 1 swap_id No documentation 1 add_id No documentation No attach No documentation No to_json No documentation No -
DbNSFP
DbNSFP exported tasks Task Description Examples annotate No documentation 1 score No documentation 1 predict No documentation 1 possible_mutations No documentation No -
Appris
Appris exported tasks Task Description Examples principal_transcripts No documentation No principal_isoforms No documentation No -
COSMIC
COSMIC exported tasks Task Description Examples coocurrence_matrix No documentation 1 -
GEO
GEO exported tasks Task Description Examples query No documentation No sample_info No documentation No matrix No documentation No differential No documentation 1 up_genes No documentation No down_genes No documentation No barcode No documentation No rank_query No documentation No rank_query_batch No documentation No -
Immunomics
Immunomics exported tasks Task Description Examples vcf_epitopes No documentation No