Workflows

The following workflows are installed in this system:

Proteomics

This workflow offers several functionalities to explore the consequence of protein mutations. It reports features that overlap the mutations, or that are in close physical proximity.

The features reported include protein domains, variants, helices, ligand binding residues, catalytic sites, transmembrane domains, InterPro domains, and known somatic mutations in different types of cancer. This information is extracted from resources such as UniProt, COSMIC, InterPro and Appris. It can also identify mutations affecting the interfaces of protein complexes.

This workflow makes use of PDB files to calculate residues in close proximity. This information is used to find features close to the mutations, at a distance of 5 angstroms, or mutations in residues close to residues in a complex partner, at a distance of up to 8 angstroms.

PDBs are extracted from Interactome3d, which organized thousands of PDBs, for both experimental structures and structure models, of individual proteins and protein complexes.

Pairwise (Smith-Waterman) alignment is used to fix all inconsistencies between protein sequences in PDBs, Uniprot and Ensembl Protein ID.

Reference:

Vazquez M, Valencia A, Pons T. (2015) Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces. Bioinformatics (2015); 31(14):2397-2399 (doi: 10.1093/bioinformatics/btv142)

Proteomics exported tasks
Task	Description	Examples
sequence_position_in_pdb	Translate the positions inside a given amino-acid sequence to positions in the sequence of a PDB by aligning them	1
pdb_chain_position_in_sequence	Translate the positions of amino-acids in a particular chain of the provided PDB into positions inside a given sequence.	1
pdb_alignment_map	Find the correspondence between sequence positions in a PDB and in a given sequence. PDB positions are reported as `chain:position`.	2
neighbour_map	For a given PDB, find all pairs of residues in a PDB that fall within a given 'distance' of each other.	1
neighbours_in_pdb	Use a PDB to find the residues neighbouring, in three dimensional space, a particular residue in a given sequence.	1
mi_neighbours	Finds residues physical proximity to amino-acid changes in protein mutations	3
mi_interfaces	Find protein mutations with affected residues in protein-protein interaction surfaces	2
annotate_mi	Annotates protein mutations based on the protein features that are overlapping amino-acid changes	2
annotate_mi_neighbours	Annotates protein mutations based on the protein features that are in close physical proximity to amino-acid changes	2
annotate_dna	Annotates genomic mutations based on the protein features that are overlapping amino-acid changes	2
dna_neighbours	Finds residues physical proximity to amino-acid changes derived from genomic mutations	No
annotate_dna_neighbours	Annotates genomic mutations based on the protein features that are in close physical proximity to amino-acid changes	No
dna_interfaces	Find genomic mutations that affect residues in protein-protein interaction surfaces	2
mi_wizard	Run a list of protein variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise). The name of the protein can be `Ensembl Protein ID` or any other protein or gene identifier, including gene symbols (e.g. KRAS:G12V)	No
dna_wizard	Run a list of genomic variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise).	No
wizard	Run a list of variants through all the analysis and produce a combined report. This analysis is limited to 1000 variants (use the other more granular methods otherwise). Variants can be expressed as genomic mutations or protein mutations. When protein mutations are used, the name of the protein can be `Ensembl Protein ID` or any other protein or gene identifier, including gene symbols (e.g. KRAS:G12V)	1
scores	Score a list of variants based on the report generated by the `wizard`. The limitation to 1000 variants still holds. Used by scores_summary.	1
score_summary	Run the entire complement of analyses over a set of (genomic or protein) variants and produce a report with scores to highlight the most relevant. Limited to 1000 variants.	1

Genomics

Genomics exported tasks

Task Description Examples

names No documentation 1

Genomics exported tasks
Task	Description	Examples
names	No documentation	1

Translation

Genes and proteins may be referenced using a variety of identifier formats: Ensembl, Entrez, UniProt, RefSeq, Affy probes, etc. Translating between these names can be time consuming and error prone.

This workflow uses identifier translation files downloaded from Ensembl BioMart to translate gene and protein identifiers between formats. The files are downloaded separatedly for each organism and build, to account for changes overtime that could introduce inconsistencies.

Translation exported tasks
Task	Description	Examples
formats	Output available identifier formats for a given organism	No
translate	Translate gene ids to a particular format	1
translate_from	Translate gene ids to a particular format given in another format	1
tsv_translate	Translate gene ids to a particular format. Return TSV	1
tsv_translate_from	Translate gene ids to a particular format given in another format. Return TSV	No
translate_protein	Translate protein ids to a particular format	No
translate_protein_from	Translate protein ids to a particular format given in another format	No
tsv_translate_protein	Translate protein ids to a particular format. Return TSV	No
tsv_translate_protein_from	Translate protein ids to a particular format given in another format. Return TSV	No
translate_probe	Translate probe ids to a particular format	No
translate_probe_from	Translate probe ids to a particular format given in another format	No
tsv_translate_probe	Translate probe ids to a particular format. Return TSV	No
tsv_translate_probe_from	Translate probe ids to a particular format given in another format. Return TSV	No
transcript_to_protein	Translate transcript to their corresponding protein ids	No
tsv_translate_multiple	Translate gene ids to a particular format given in another format. Return TSV with multiple results	1

Sequence

Finds genomic features overlapping genomic positions, like exons, reconstructs offsets into transcripts, and computes the amino-acid changaes of variants. Additionally finds mutations in exon junctions, and genes with high frequencies of mutations.

Sequence exported tasks
Task	Description	Examples
add_reference	Add reference to mutations as (ref\>mut)	No
type	Report the type of base change: transition, transversion, indel, unknown or none at all	5
splicing_mutations	Find mutations that may affect the splicing of protein coding transcripts	4
affected_genes	Finds genes affeted by genomic mutations, either by amino-acid changes on their protein products, or by changes in splicing sequences	6
sequence_ontology	No documentation	No
binomial_significance_syn	For a list of mutations, find genes that suffer a higher rate of mutation than expected. Considers also synonymous mutations from the data. Considers only exon bases	No
reference	Report the reference base at the provided positions	3
gene_strand_reference	Report the reference base at the provided positions on the gene coding strand that position. In case of overlap the forward or watson strand is used.	No
to_watson	No documentation	No
is_watson	Guess wether the mutations provided are given in the watson strand or the gene strand	No
genes	Report genes overlapping positions	No
exons	Report exons overlapping positions	No
transcripts	Report transcripts overlapping positions	No
exon_junctions	Report exon junctions overlapping positions	5
genes_at_ranges	Report genes overlapping ranges	No
transcript_offsets	Computes the offset inside the coding sequence of the transcripts overlapped the genomic mutations that overlap them.	No
mutated_isoforms	Computes the consequence of genomic mutations in terms of amino-acid changes in protein isoforms case, only consequences in princial isoforms will be reported (as defined by Appris)	5
TSS	No documentation	No
TSS_in_range	No documentation	No
mutated_isoforms_fast	One-step implementation of the `mutated_isoforms` task case, only consequences in princial isoforms will be reported (as defined by Appris)	7
expanded_vcf	Expands the `INFO` and `FORMAT/Sample` fields of VCF files in to a standard TSV format	No
genomic_mutations	Extract genomic mutations from a VCF file that match a quality criteria	No

Enrichment

Functional enrichment analysis of gene lists using the hypergeometric distribution.

Enrichment exported tasks

Task Description Examples

rank_enrichment No documentation 1

enrichment No documentation 1

Enrichment exported tasks
Task	Description	Examples
rank_enrichment	No documentation	1
enrichment	No documentation	1

MutationEnrichment

Enrichment analysis based on mutation frequencies

MutationEnrichment exported tasks
Task	Description	Examples
mutation_pathway_enrichment	No documentation	1
sample_pathway_enrichment	No documentation	1
gene_count_enrichment	No documentation	No

TSVWorkflow

Utilities for TSV files

TSVWorkflow exported tasks
Task	Description	Examples
change_key	No documentation	1
swap_id	No documentation	1
add_id	No documentation	No
attach	No documentation	No
to_json	No documentation	No

DbNSFP

DbNSFP exported tasks

Task Description Examples

annotate No documentation 1

score No documentation 1

predict No documentation 1

possible_mutations No documentation No
Appris

Appris exported tasks

Task Description Examples

principal_transcripts No documentation No

principal_isoforms No documentation No
COSMIC

COSMIC exported tasks

Task Description Examples

coocurrence_matrix No documentation 1

DbNSFP exported tasks
Task	Description	Examples
annotate	No documentation	1
score	No documentation	1
predict	No documentation	1
possible_mutations	No documentation	No

Appris exported tasks
Task	Description	Examples
principal_transcripts	No documentation	No
principal_isoforms	No documentation	No

COSMIC exported tasks
Task	Description	Examples
coocurrence_matrix	No documentation	1

GEO

GEO exported tasks
Task	Description	Examples
query	No documentation	No
sample_info	No documentation	No
matrix	No documentation	No
differential	No documentation	1
up_genes	No documentation	No
down_genes	No documentation	No
barcode	No documentation	No
rank_query	No documentation	No
rank_query_batch	No documentation	No

Immunomics

Immunomics exported tasks

Task Description Examples

vcf_epitopes No documentation No