Discover and run vetted analysis pipelines on Galaxy
Ready-to-use, open-source pipelines with sample data and training materials to make progress quickly and reliably.
Get started with one of our most popular workflows, or browse the full library below.
Complete RNA-Seq analysis for paired-end data: Processes raw FASTQ data through adapter and bad quality removal (fastp), alignment with STAR using ENCODE parameters, gene quantification via multiple methods (STAR and featureCounts), and expression calculation (FPKM with Cufflinks/StringTie, normalized coverage with bedtools). Produces count tables, normalized expression values, and genomic coverage tracks. Supports stranded and unstranded libraries, generating both HTSeq-compatible counts and normalized measures for downstream analysis.
1.2
Updated on Jan 27, 2025
Complete ChIP-seq analysis for paired-end sequencing data. Processes raw FASTQ files through adapter removal (cutadapt), alignment to reference genome (Bowtie2), and stringent quality filtering (MAPQ >= 30, concordant pairs only). Peak calling with MACS2 optimized for paired-end reads identifies protein-DNA binding sites. Generates alignment files, peak calls, and quality metrics for downstream analysis.
0.14
Updated on Mar 10, 2025
Complete ATAC-seq analysis pipeline for paired-end reads. Processes raw FASTQ data through adapter and bad quality removal (cutadapt), alignment (Bowtie2 end-to-end), and filtering (removes MT reads, discordant pairs, low mapping quality <30, PCR duplicates). Generates 5' cut site pileups (±100bp), performs peak calling, and quantifies reads in 1kb summit-centered regions. Produces two normalized coverage tracks (per million mapped reads and per million reads in peaks) and fragment length distribution plots for quality assessment.
1.0
Updated on Nov 28, 2024
This workflow generates Hi-C contact maps for genome assemblies in the Pretext format. It is compatible with one or 2 haplotypes. It includes tracks for PacBio read coverage, Gaps, and telomeres. The Pretext files can be open in PretextView for the manual curation of genome assemblies.
1.0beta2
Updated on Mar 28, 2025
This workflow perform the scaffolding of a genome assemble using HiC data with YAHS. Part of the VGP set of workflows.
1.3
Updated on Mar 27, 2025
Identifies differentially expressed genes between exactly two experimental conditions from count tables. The workflow performs statistical testing, applies filters based on adjusted p-value and log2 fold change thresholds, and generates publication-quality visualizations including volcano plots, MA plots, and heatmaps. Takes two collections of count tables as input and produces filtered gene lists and interactive plots for interpreting expression differences. Optimal for simple two-condition experimental designs.
0.4
Updated on Mar 24, 2025
The workflow for Illumina-sequenced ARTIC data builds on the RNASeq workflow for paired-end data using the same steps for mapping and variant calling, but adds extra logic for trimming ARTIC primer sequences off reads with the ivar package. In addition, this workflow uses ivar also to identify amplicons affected by ARTIC primer-binding site mutations and tries to exclude reads derived from such tainted amplicons when calculating allele-frequencies of other variants.
0.5.4
Updated on Mar 17, 2025
0.9.2
Updated on Mar 17, 2025
Purge contigs marked as duplicates by purge_dups (could be haplotypic duplication or overlap duplication). This workflow is the 6th workflow of the VGP pipeline. It is meant to be run after one of the contigging steps (Workflow 3, 4, or 5)
0.8
Updated on Mar 17, 2025
Assemble Genome using PacBio HiFi and HiC data for phasing. Prerequisite: Run k-mer profiling workflow (VGP1).
0.3.2
Updated on Mar 17, 2025
0.1.4
Updated on Mar 17, 2025
Purge duplicates from one haplotype. Prerequisites: run after a k-mer profiling workflow (VGP 1 or 2) and a contiging workflow (VGP 3,4 or 5).
0.7.5
Updated on Mar 17, 2025
A workflow for the analysis of pox virus genomes sequenced as half-genomes (for ITR resolution) in a tiled-amplicon approach
0.3
Updated on Mar 17, 2025
Applies baredSC algorithm to fit and combine one-dimensional Gaussian mixture models (from 1 to N components) on log-normalized single-cell gene expression data. Enables identification of subpopulations based on expression of genes of interest and provides statistical assessment of the optimal number of components in heterogeneous cell populations.
0.6
Updated on Mar 10, 2025
Applies baredSC algorithm to fit and combine two-dimensional Gaussian mixture models (from 1 to N components) on log-normalized single-cell gene expression data. Enables identification of subpopulations based on co-expression patterns of two genes of interest and provides statistical assessment of the optimal number of components in heterogeneous cell populations.
0.6
Updated on Mar 10, 2025
Comprehensive preprocessing for 10X Genomics CellPlex multiplexed single-cell RNA-seq data. Processes Cell Multiplexing Oligo (CMO) FASTQ files with CITE-seq-Count including required CellPlex-specific translation steps. Simultaneously processes gene expression FASTQ files with STARsolo and DropletUtils for alignment and cell filtering, and formats outputs for seamless import into Seurat/Scanpy (Read10X function).
0.6.2
Updated on Mar 10, 2025
Complete preprocessing pipeline for 10X Genomics v3 single-cell RNA-seq data. Aligns raw FASTQ files using STARsolo, performs cell calling and quality filtering with DropletUtils, and formats outputs for seamless import into Seurat/Scanpy (Read10X function).
0.6.2
Updated on Mar 10, 2025
This workflows performs single end read mapping with bowtie2 followed by sensitive variant calling across a wide range of AFs with lofreq
0.1.6
Updated on Mar 10, 2025
Build a consensus sequence from FILTER PASS variants with intrasample allele-frequency above a configurable consensus threshold. Hard-mask regions with low coverage (but not consensus variants within them) and ambiguous sites.
0.4.3
Updated on Mar 10, 2025
This workflow creates taxonomic summary tables for a specified taxonomic rank out of MAPseq's OTU tables output collection.
0.2
Updated on Mar 10, 2025
Classification and visualization of ITS regions.
0.2
Updated on Mar 10, 2025
MGnify's amplicon pipeline v5.0. Including the Quality control for single-end and paired-end reads, rRNA-prediction, and ITS sub-WFs.
0.2
Updated on Mar 10, 2025
Classification and visualization of SSU, LSU sequences.
0.2
Updated on Mar 10, 2025
The MAPseq to Ampvis workflow processes MAPseq OTU tables and associated metadata for analysis in Ampvis2. This workflow involves reformatting MAPseq output datasets to produce structured output files suitable for Ampvis2.
0.2
Updated on Mar 10, 2025
This workflow creates taxonomic summary tables out of the amplicon pipeline results.
0.2
Updated on Mar 10, 2025
Quality control subworkflow for single-end reads.
0.3
Updated on Mar 10, 2025
Quality control subworkflow for paired-end reads.
0.2
Updated on Mar 10, 2025
Complete ChIP-seq analysis for single-end sequencing data. Processes raw FASTQ files through adapter removal (cutadapt), alignment to reference genome (Bowtie2), and quality filtering (MAPQ >= 30). Peak calling with MACS2 uses either a fixed extension parameter or built-in model to identify protein-DNA binding sites. Generates alignment files, peak calls, and quality metrics for downstream analysis.
0.14
Updated on Mar 10, 2025
Complete ChIP-seq analysis for paired-end sequencing data. Processes raw FASTQ files through adapter removal (cutadapt), alignment to reference genome (Bowtie2), and stringent quality filtering (MAPQ >= 30, concordant pairs only). Peak calling with MACS2 optimized for paired-end reads identifies protein-DNA binding sites. Generates alignment files, peak calls, and quality metrics for downstream analysis.
0.14
Updated on Mar 10, 2025
Microbiome - Variant calling and Consensus Building
0.1.4
Updated on Mar 10, 2025
This workflow allows for genome annotation using Maker and evaluates the quality of the annotation.
0.1
Updated on Mar 6, 2025
This workflow runs the FEELnc tool to annotate long non-coding RNAs. Before annotating these long non-coding RNAs, StringTie will be used to assemble the RNA-seq alignments into potential trancriptions. The gffread tool provides a genome annotation file in GTF format.
0.1
Updated on Mar 5, 2025
dada2 amplicon analysis for paired end data The workflow has three main outputs: - the sequence table (output of makeSequenceTable) - the taxonomy (output of assignTaxonomy) - the counts which allow to track the number of sequences in the samples through the steps (output of sequence counts)
0.3
Updated on Feb 17, 2025
Identifies high-confidence consensus peaks from ChIP-seq single-end replicate experiments. The workflow calls peaks on individual replicates and identifies their intersection. To control for sequencing depth differences, it subsamples all replicates to the smallest library size, performs peak calling on the combined normalized data, and retains only peaks whose summits overlap with intersections from a user-defined minimum number of replicates.
1.3
Updated on Jan 27, 2025
Identifies high-confidence consensus peaks from ChIP-seq paired-end replicate experiments. The workflow calls peaks on individual replicates and identifies their intersection. To control for sequencing depth differences, it subsamples all replicates to the smallest library size, performs peak calling on the combined normalized data, and retains only peaks whose summits overlap with intersections from a user-defined minimum number of replicates.
1.3
Updated on Jan 27, 2025
Identifies high-confidence consensus peaks from ATAC-seq or CUT&RUN replicate experiments. The workflow calls peaks on individual replicates and identifies their intersection. To control for sequencing depth differences, it subsamples all replicates to the smallest library size, performs peak calling on the combined normalized data, and retains only peaks whose summits overlap with intersections from a user-defined minimum number of replicates.
1.3
Updated on Jan 27, 2025
Complete CUT&RUN/CUT&TAG analysis workflow for paired-end sequencing data. Processes raw FASTQ files through adapter removal (cutadapt) and alignment (Bowtie2 with dovetail option enabled). Applies quality filtering (MAPQ ≥ 30, concordant pairs only), converts BAM to BED format, and performs peak calling using MACS2 with parameters optimized for the punctate signal profile characteristic of CUT&RUN/CUT&TAG experiments.
0.14
Updated on Jan 27, 2025
Complete RNA-Seq analysis for paired-end data: Processes raw FASTQ data through adapter and bad quality removal (fastp), alignment with STAR using ENCODE parameters, gene quantification via multiple methods (STAR and featureCounts), and expression calculation (FPKM with Cufflinks/StringTie, normalized coverage with bedtools). Produces count tables, normalized expression values, and genomic coverage tracks. Supports stranded and unstranded libraries, generating both HTSeq-compatible counts and normalized measures for downstream analysis.
1.2
Updated on Jan 27, 2025
Complete RNA-Seq analysis for single-end data: Processes raw FASTQ data through adapter and bad quality removal (fastp), alignment with STAR using ENCODE parameters, gene quantification via multiple methods (STAR and featureCounts), and expression calculation (FPKM with Cufflinks/StringTie, normalized coverage with bedtools). Produces count tables, normalized expression values, and genomic coverage tracks. Supports stranded and unstranded libraries, generating both HTSeq-compatible counts and normalized measures for downstream analysis.
1.2
Updated on Jan 27, 2025
This workflow allows you to annotate a genome with Helixer and evaluate the quality of the annotation using BUSCO and Genome Annotation statistics. GFFRead is also used to predict protein sequences derived from this annotation, and BUSCO and OMArk are used to assess proteome quality.
0.2
Updated on Jan 27, 2025
This workflow performs subtyping and consensus sequence generation for batches of Illumina PE sequenced Influenza A isolates.
0.1
Updated on Jan 9, 2025
Performs k-mer profiling on PacBio data and generates GenomeScope plots and summary for genome characteristics assessment.
0.1.9
Updated on Dec 17, 2024
In proteomics research, verifying detected peptides is essential for ensuring data accuracy and biological relevance. This tutorial continues from the clinical metaproteomics discovery workflow, focusing on verifying identified microbial peptides using the PepQuery tool.
0.2
Updated on Dec 16, 2024
This workflow uses eggNOG mapper and InterProScan for functional annotation of protein sequences.
0.1
Updated on Dec 4, 2024
Create Meryl Database used for the estimation of assembly parameters and quality control with Merqury. Part of the VGP pipeline.
0.1.5
Updated on Dec 3, 2024
Complete ATAC-seq analysis pipeline for paired-end reads. Processes raw FASTQ data through adapter and bad quality removal (cutadapt), alignment (Bowtie2 end-to-end), and filtering (removes MT reads, discordant pairs, low mapping quality <30, PCR duplicates). Generates 5' cut site pileups (±100bp), performs peak calling, and quantifies reads in 1kb summit-centered regions. Produces two normalized coverage tracks (per million mapped reads and per million reads in peaks) and fragment length distribution plots for quality assessment.
1.0
Updated on Nov 28, 2024
This workflow will perform taxonomic and functional annotations using Unipept and statistical analysis using MSstatsTMT.
0.1
Updated on Nov 19, 2024
Performs differential gene expression analysis on single-cell data using a pseudobulk approach. Aggregates cell-level counts from an annotated AnnData object by cell type or other metadata using Decoupler, then applies edgeR for robust statistical testing between conditions. Includes data preprocessing steps for compatibility with edgeR and generates interactive volcano plots to visualize significantly differentially expressed genes.
0.1.1
Updated on Nov 18, 2024
Workflow for clinical metaproteomics database searching
0.1
Updated on Nov 18, 2024
The workflow begins with the Database Generation process. The Galaxy-P team has developed a workflow that collects protein sequences from known disease-causing microorganisms to build a comprehensive database. This extensive database is then refined into a smaller, more relevant dataset using the Metanovo tool.
0.1
Updated on Nov 18, 2024
Assembly of bacterial paired-end short read data with generation of quality metrics and reports
1.1.5
Updated on Nov 18, 2024
Short paired-end read analysis to provide quality analysis, read cleaning and taxonomy assignation
1.1.6
Updated on Nov 18, 2024
This workflow performs segmentation and counting of cell nuclei using fluorescence microscopy images. The segmentation step is performed using Otsu thresholding (Otsu, 1979). The workflow is based on the tutorial: https://training.galaxyproject.org/training-material/topics/imaging/tutorials/imaging-introduction/tutorial.html
0.2
Updated on Nov 7, 2024
Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.
0.3
Updated on Nov 4, 2024
Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.
0.3
Updated on Nov 4, 2024
Importing single-end multiplexed data (not demultiplexed yet)
0.3
Updated on Nov 4, 2024
Importing paired-end multiplexed data (not demultiplexed yet)
0.3
Updated on Nov 4, 2024
Importing demultiplexed data (single-end)
0.3
Updated on Nov 4, 2024
Importing demultiplexed data (paired-end)
0.3
Updated on Nov 4, 2024
This workflow - Reconstruct phylogeny (insert fragments in a reference) - Alpha rarefaction analysis - Taxonomic analysis
0.2
Updated on Nov 4, 2024
The first step in hypothesis testing in microbial ecology is typically to look at within- (alpha) and between-sample (beta) diversity. We can calculate diversity metrics, apply appropriate statistical tests, and visualize the data using the q2-diversity plugin.
0.2
Updated on Nov 4, 2024
Performs functional enrichment analysis of gene sets using GOseq, identifying over-represented Gene Ontology terms and KEGG pathways. This workflow accounts for gene length bias in RNA-seq data when calculating enrichment statistics. Input requires differentially expressed genes and gene lengths. Generates comprehensive tables and visualizations of enriched GO terms across all three ontologies (Biological Process, Molecular Function, Cellular Component) as well as KEGG pathway enrichment results.
0.1
Updated on Nov 3, 2024
Workflow for variant analysis against a reference genome in GenBank format
0.1
Updated on Oct 29, 2024
Antimicrobial resistance gene detection from assembled bacterial genomes
1.1.5
Updated on Oct 21, 2024
Annotation of an assembled bacterial genomes to detect genes, potential plasmids, integrons and Insertion sequence (IS) elements.
1.1.7
Updated on Oct 21, 2024
End-to-end analysis of single-cell RNA-seq data using the Scanpy/AnnData ecosystem. Imports count matrices, applies quality control filtering of low-quality cells and genes, normalizes and scales data, performs dimension reduction (PCA), and identifies cell clusters using the Louvain algorithm. Generates publication-ready visualizations including UMAP plots colored by clusters and marker genes.
0.1
Updated on Oct 9, 2024
Extends 3' ends of gene annotations using BAM files (from STAR alignments) and a reference GTF. Specifically designed for 3'-biased sequencing techniques like 10X scRNA-seq or BRB-seq that primarily capture transcript 3' ends. The BREW3R tool enhances annotations by using evidence from RNA-seq data to improve 3' UTR definitions, which is particularly important for accurate quantification in single-cell and bulk RNA-seq experiments.
0.2
Updated on Oct 7, 2024
This workflow takes a VCF dataset of variants produced by any of the *-variant-calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.
0.3.4
Updated on Sep 24, 2024
Clinical Metaproteomics 4: Quantitation
0.1
Updated on Aug 14, 2024
This workflow takes as input a SRA_manifest from SRA Run Selector and will generate one fastq file or fastq pair of file for each experiment (concatenated multiple runs if necessary). Output will be relabelled to match the column specified by the user.
0.7
Updated on Jun 17, 2024
Automated inference of stable isotope incorporation rates in proteins for functional metaproteomics
0.2
Updated on Jun 17, 2024
Downloads fastq files for sequencing run accessions provided in a text file using fasterq-dump. Creates one job per listed run accession.
0.1.14
Updated on May 27, 2024
Microbiome - QC and Contamination Filtering
0.1
Updated on Apr 25, 2024
Microbiome - Taxonomy Profiling
0.1
Updated on Apr 25, 2024
Pathogens of all samples report generation and visualization
0.1
Updated on Apr 24, 2024
Nanopore datasets analysis - Phylogenetic Identification - antibiotic resistance genes detection and contigs building
0.1
Updated on Apr 18, 2024
Assemble long reads with Flye, then view assembly statistics and assembly graph
0.2
Updated on Mar 25, 2024
Processes 10X Genomics single-cell RNA-seq data using Velocyto to quantify spliced and unspliced transcript counts for RNA velocity analysis. Automatically extracts cell barcodes from standard 10X bundled output and generates a loom file containing separate counts for spliced exons, unspliced introns, and ambiguous regions. Enables downstream trajectory inference and cellular dynamics analysis.
0.2
Updated on Feb 5, 2024
Processes 10X Genomics single-cell RNA-seq data using Velocyto to quantify spliced and unspliced transcript counts for RNA velocity analysis. Takes pre-filtered cell barcodes as input and generates a loom file containing separate counts for spliced exons, unspliced introns, and ambiguous regions. Enables downstream trajectory inference and cellular dynamics analysis in tools like scVelo.
0.2
Updated on Feb 5, 2024
MMGBSA simulation and calculation
0.1.5
Updated on Nov 27, 2023
This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract and the metaMS R package (Wehrens, R 2014) for the field of untargeted metabolomics. https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/gcms/tutorial.html
0.1
Updated on Nov 22, 2023
This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract, filter, align and fill gapand the possibility to annotate isotopes, adducts and fragments using the CAMERA R package (Kuhl, C 2012). https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.html
1.0
Updated on Nov 22, 2023
This workflows performs paired end read mapping with bwa-mem followed by sensitive variant calling across a wide range of AFs with lofreq
0.2.4
Updated on Nov 20, 2023
This workflow for ONT-sequenced ARTIC data is modeled after the alignment/variant-calling steps of the [ARTIC pipeline](https://artic.readthedocs.io/en/latest/). It performs, essentially, the same steps as that pipeline’s minion command, i.e. read mapping with minimap2 and variant calling with medaka. Like the Illumina ARTIC workflow it uses ivar for primer trimming. Since ONT-sequenced reads have a much higher error rate than Illumina-sequenced reads and are therefor plagued more by false-positive variant calls, this workflow does make no attempt to handle amplicons affected by potential primer-binding site mutations.
0.3.2
Updated on Nov 20, 2023
Virtual screening of the SARS-CoV-2 main protease with rDock and pose scoring
0.1.5
Updated on Nov 20, 2023
Perform dcTMD free energy simulations and calculations
0.1.5
Updated on Nov 20, 2023
Workflow for variant analysis against a reference genome in GenBank format
0.1.1
Updated on Nov 20, 2023
This workflow takes a VCF dataset of variants produced by any of the variant calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.
0.1.1
Updated on Nov 20, 2023
Calculates average signal values across replicate BigWig files to generate a single consolidated track. Requires input file identifiers formatted as "sample_name_replicateID". The tool automatically groups files by sample name, computes the mean signal at each position, and outputs consolidated BigWig files named after the common sample prefix (without the replicate ID suffix).
0.2
Updated on Sep 27, 2023
Comprehensive Hi-C data processing workflow that converts paired-end FASTQ files to balanced contact matrices. Uses HiCUP for pre-processing (adapter trimming, mapping, filtering) with fragment midpoint coordinate assignment, filters reads by mapping quality, and generates sorted tabix files. The final output is a balanced cool file at user-specified resolution created with cooler, ready for multi-resolution contact matrix analysis.
0.3
Updated on Sep 8, 2023
End-to-end processing of Capture Hi-C paired-end sequencing data. Transforms raw FASTQ files into balanced contact matrices using HiCUP for processing Hi-C reads (adapter trimming, mapping, filtering for valid pairs) with additional filtering for MAPQ and captured regions. The workflow generates sorted tabix files via cooler and produces balanced cool files at user-specified resolution for downstream analysis and visualization.
0.3
Updated on Sep 8, 2023
Converts Hi-C interaction data from Juicer medium format (tabix files) to balanced cooler format. This streamlined workflow takes a collection of Juicer medium tabix files and a reference genome name as input and generates balanced cool files at user-specified resolution. Enables seamless transition between different Hi-C analysis ecosystems while maintaining data integrity.
0.3
Updated on Sep 8, 2023
Processes Hi-C paired-end FASTQ files to generate validated interaction pairs using HiCUP. The workflow truncates reads at ligation junctions, maps to reference genome, assigns to restriction fragments, and filters out experimental artifacts (self-ligated, dangling ends, internal fragments, and size outliers). Removes duplicates and converts outputs to formats compatible with Juicebox/cooler using fragment midpoints as coordinates. Final filtering by mapping quality ensures high-confidence interaction data.
0.3
Updated on Sep 8, 2023
Racon polish with long reads, x4
0.1
Updated on Jul 15, 2023
Find and annotate variants in ampliconic SARS-CoV-2 Illumina sequencing data and classify samples with pangolin and nextclade
0.2.3
Updated on Nov 22, 2022