111 workflows
Purge duplicate contigs from a diploid assembly VGP6
Purge contigs marked as duplicates by purge_dups.
Purging duplicates in one haplotype VGP6b
Purge contigs marked as duplicates by purge_dups in a single haplotype (could be haplotypic duplication or overlap duplication). If you think the purged contigs might belong to the other haplotype, use the workflow VGP6 instead. This workflow is the 6th workflow of the VGP pipeline. It is meant to be run after one of the contigging steps (Workflow 3, 4, or 5).
Host or Contamination Removal on Short-Reads
This workflow takes paired-end Illumina fastq(.gz) files and runs Bowtie to map the reads against a reference genome (human, by default) and keep only the reads that do not align. MultiQC is used to aggregate the mapping reports.
Host or Contamination Removal on Long-Reads
This workflow takes Nanopore fastq(.gz) files and runs Minimap2 to map the reads against a reference genome (human, by default). It filters the output to keep only the unmapped reads and generates mapping statistics that are aggregated into a MultiQC report.
RNA-Seq Differential Expression Analysis with Visualization
Identifies differentially expressed genes between exactly two experimental conditions from count tables. The workflow performs statistical testing, applies filters based on adjusted p-value and log2 fold change thresholds, and generates publication-quality visualizations including volcano plots, MA plots, and heatmaps. Takes two collections of count tables as input and produces filtered gene lists and interactive plots for interpreting expression differences. Optimal for simple two-condition experimental designs.
core genome Multilocus Sequence Typing (cgMLST) of bacterial genome
This workflow performs core genome multilocus sequence typing (cgMLST) on contigs corresponding to one bacterial genome to characterize bacterial strains using curated reference schemes.
Scaffolding with Hi-C data VGP8
This workflow performs the scaffolding of a genome assembly using HiC data with YAHS. Can be used on any assembly with Hi-C data, and the assembly in the gfa format.
Metagenomic Genes Catalogue Analysis
Metagenomic analysis, from raw reads to gene catalog. Uses Megahit to assemble contigs and Prodgial to predict CDSs on contigs to provide the gene catalog. Finally, functional, taxonomic, and antimicrobial resistance information is provided.
Short-read quality control and trimming
This workflow performs quality control and trimming on paired-end Illumina fastq(.gz) files using fastp and aggregates the quality control reports with MultiQC
Genome annotation with Braker3
This workflow performs genome annotation using Braker3 and evaluates the quality of the annotation with BUSCO and genome annotation statistics.
Variant calling and consensus construction from paired end short read data of non-segmented viral genomes
Variant calling and consensus sequence generation for batches of Illumina PE sequenced viruses with uncomplicated and stable genome structure (like e.g. Morbilliviruses).
Post-Assembly Quality Control and Contamination Check for Bacterial Genomes
This workflow performs quality and contamination control analysis on assembled contigs to assess bacterial genome quality and taxonomic assignment
Bacterial Genome Assembly using Shovill
Assembly of bacterial paired-end short read data
Raw Read Quality and Contamination Control For Genome Assembly
Short paired-end read analysis to provide quality analysis, read cleaning and taxonomy assignation directly from raw reads
Metagenomics Taxonomic and Antibiotic Resistance Gene (ARG) Profiling
This workflow starts from metagenomics short-read data and performs, taxonomic profiling (using Sylph), predicts Antibiotic Resistance Genes (ARGs) (using Groot and deepARG), and standardizes ARG annotations (using argNorm).
K-mer profiling and reads statistics VGP1
Evaluation of Pacbio Hifi Reads and genome profiling. Create Meryl Database used for the estimation of assembly parameters and quality control with Merqury. Part of the VGP pipeline.
Genome Assembly from Hifi reads with HiC phasing - VGP4
Assemble Genome using PacBio HiFi and HiC data from the same individual for phasing. Prerequisite: Run k-mer profiling workflow (VGP1). This workflow uses HiFiasm for contigging, and generates assembly statistics, BUSCO reports, Merqury histograms, and the genome assembly contigs in fasta and GFA format.
Assembly decontamination VGP9
Decontamination (foreign contaminants and mitochondrial sequences) of a genome assembly after the final scaffolding step. Uses NCBI FCS GX to identify foreign contaminants and Blast to identify mitochondrial sequences. Part of the VGP Suite.
Pox Virus Illumina Amplicon Workflow from half-genomes
A workflow for the analysis of pox virus genomes sequenced as half-genomes (for ITR resolution) in a tiled-amplicon approach
Metagenome-Assembled Genomes (MAGs) generation
This workflow produces Metagenome-Assembled Genomes (MAGs) from paired-end metagenomic reads. It includes assembly, binning, refinement, dereplication, annotation, taxonomic classification, quality assessment, and abundance estimation. All results are summarised in a single integrated report and aggregated tables.
ChIP-seq Analysis: Single-End Read Processing
Complete ChIP-seq analysis for single-end sequencing data. Processes raw FASTQ files through adapter removal (fastp), alignment to reference genome (Bowtie2), and quality filtering (MAPQ greater than 30). Peak calling with MACS2 uses a fixed extension of 200bp to identify protein-DNA binding sites. Generates alignment files, coverage, peak calls, and quality metrics for downstream analysis.
ChIP-seq Analysis: Paired-End Read Processing
Complete ChIP-seq analysis for paired-end sequencing data. Processes raw FASTQ files through adapter removal (fastp), alignment to reference genome (Bowtie2), and stringent quality filtering (MAPQ greater than 30, concordant pairs only). Peak calling with MACS2 optimized for paired-end reads identifies protein-DNA binding sites. Generates alignment files, coverage, peak calls, and quality metrics for downstream analysis.
Genome Assembly with Pacbio Hifi reads and Trio data for phasing - VGP5
Generate phased assembly based on PacBio HiFi reads and parental Illumina data for phasing. Part of the VGP workflow suite, it needs to be run after the Trio k-mer Profiling workflow VGP2. This workflow uses HiFiasm for contigging, and generates assembly statistics, BUSCO reports, Merqury plots, and the genome assembly contigs in fasta and GFA format.
Genome Assembly from Hifi reads - VGP3
Generate a genome assembly based on PacBio HiFi reads. Part of the VGP suite, it needs to be run after the VGP1 k-mer profiling workflow. The assembly contigs are built using HiFiasm, and the workflow generates assembly statistics, BUSCO reports, Merqury plots, and the contigs in fasta and GFA formats.
Mitogenome Assembly VGP0
Generate mitochondrial assembly based on PacBio HiFi reads. Part of the VGP suite, it can be run at any time independently of the other workflows. This workflow uses MitoHiFi and a mitochondrial reference to assemble the mitochondrial genome from PacBio reads. You do not need to provide the reference yourself, only the Latin name of the species.
Generate Nx and Size plots for multiple assemblies
Generate Nx and size plots for multiple assemblies to compare the evolution of assembly quality through the scaffolding process. Inputs are the fasta files for each assembly to compare.
Bacterial Genome Annotation
Annotation of an assembled bacterial genomes to detect genes, potential plasmids, integrons and Insertion sequence (IS) elements.
Multiplex Tissue Microarray Analysis
Perform background subtraction, nuclear segmentation, feature quantification, cellular phenotyping, spatial analysis, and interactive visualization of registered TMA core multiplex tissue images
End-to-End Tissue Microarray Analysis
Complete multiplex tissue image (MTI) analysis pipeline for tissue microarray (TMA) data imaged using cyclic immunofluorescence: Performs illumination correction, stitching and registration, and tissue microarray segmentation. Tissue-segmented images undergo nuclear segmentation, cell/nuclei feature quantification (mean marker intensities, cell coordinates, and morphological features), and cell phenotyping. Produces outputs that are compatible with downstream single-cell/spatial analysis and interactive image viewers including: Pyramidal OME-TIFF images, nuclear segmentation masks (TIFF), quantified feature tables (CSV, h5ad) with cell type annotations, and an interactive Vitessce dashboard that combines image viewing with linked single-cell data visualizations.
PretextMap Generation from 1 or 2 haplotypes
This workflow generates Hi-C contact maps for genome assemblies in the Pretext format. It is compatible with one or 2 haplotypes. It includes tracks for PacBio read coverage, Gaps, and telomeres. The Pretext files can be open in PretextView for the manual curation of genome assemblies.
Single-Cell RNA-seq Preprocessing: 10X Genomics CellPlex Multiplexed Samples
Comprehensive preprocessing for 10X Genomics CellPlex multiplexed single-cell RNA-seq data. Processes Cell Multiplexing Oligo (CMO) FASTQ files with CITE-seq-Count including required CellPlex-specific translation steps. Simultaneously processes gene expression FASTQ files with STARsolo and DropletUtils for alignment and cell filtering, and formats outputs for seamless import into Seurat/Scanpy (Read10X function).
Single-Cell RNA-seq Preprocessing: 10X Genomics v3 to Seurat and Scanpy Compatible Format
Complete preprocessing pipeline for 10X Genomics v3 single-cell RNA-seq data. Aligns raw FASTQ files using STARsolo, performs cell calling and quality filtering with DropletUtils, and formats outputs for seamless import into Seurat/Scanpy (Read10X function).
Genome assembly with Flye
Assemble long reads with Flye, then view assembly statistics and assembly graph
Mass spectrometry: LC-MS preprocessing with XCMS
This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract, filter, align and fill gapand the possibility to annotate isotopes, adducts and fragments using the CAMERA R package (Kuhl, C 2012). https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.html
Mass spectrometry: GCMS with metaMS
This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract and the metaMS R package (Wehrens, R 2014) for the field of untargeted metabolomics. https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/gcms/tutorial.html
MGnify's amplicon pipeline v5.0
MGnify's amplicon pipeline v5.0. Including the Quality control for single-end and paired-end reads, rRNA-prediction, and ITS sub-WFs.
MGnify's amplicon pipeline v5.0 - Quality control PE
Quality control subworkflow for paired-end reads.
Influenza A isolate subtyping and consensus sequence generation
This workflow performs subtyping and consensus sequence generation for batches of Illumina PE sequenced Influenza A isolates.
kmer-profiling-hifi-trio-VGP2
Create Meryl Database used for the estimation of assembly parameters and quality control with Merqury. Part of the VGP pipeline.
QCxMS Spectra Prediction from SDF
Workflow to predict EI mass spectra using QCxMS starting from a single SDF file, containing the 3D coordinates of all atoms in the molecule. These files can typically be obtained from PubChem.
SARS-CoV-2 Illumina Amplicon pipeline - iVar based
Find and annotate variants in ampliconic SARS-CoV-2 Illumina sequencing data and classify samples with pangolin and nextclade
Molecular formula assignment and recalibration with MFAssignR package.
This workflow can be used to assign multi-element molecular formulas to ultrahigh resolution mass spectra.
COVID-19: variation analysis on ARTIC PE data
The workflow for Illumina-sequenced ARTIC data builds on the RNASeq workflow for paired-end data using the same steps for mapping and variant calling, but adds extra logic for trimming ARTIC primer sequences off reads with the ivar package. In addition, this workflow uses ivar also to identify amplicons affected by ARTIC primer-binding site mutations and tries to exclude reads derived from such tainted amplicons when calculating allele-frequencies of other variants.
COVID-19: consensus construction
Build a consensus sequence from FILTER PASS variants with intrasample allele-frequency above a configurable consensus threshold. Hard-mask regions with low coverage (but not consensus variants within them) and ambiguous sites.
COVID-19: variation analysis on WGS SE data
This workflows performs single end read mapping with bowtie2 followed by sensitive variant calling across a wide range of AFs with lofreq
MGnify's amplicon pipeline v5.0 - Quality control SE
Quality control subworkflow for single-end reads.
MAPseq to ampvis2
The MAPseq to Ampvis workflow processes MAPseq OTU tables and associated metadata for analysis in Ampvis2. This workflow involves reformatting MAPseq output datasets to produce structured output files suitable for Ampvis2.
Taxonomic abundance summary tables for a specified taxonomic rank
This workflow creates taxonomic summary tables for a specified taxonomic rank out of MAPseq's OTU tables output collection.
MGnify's amplicon pipeline v5.0 - rRNA prediction
Classification and visualization of SSU, LSU sequences.
MGnify amplicon summary tables
This workflow creates taxonomic summary tables out of the amplicon pipeline results.
Single-Cell Mixture Analysis: baredSC 1D Log-Normalized Models
Applies baredSC algorithm to fit and combine one-dimensional Gaussian mixture models (from 1 to N components) on log-normalized single-cell gene expression data. Enables identification of subpopulations based on expression of genes of interest and provides statistical assessment of the optimal number of components in heterogeneous cell populations.
Single-Cell Mixture Analysis: baredSC 2D Log-Normalized Models
Applies baredSC algorithm to fit and combine two-dimensional Gaussian mixture models (from 1 to N components) on log-normalized single-cell gene expression data. Enables identification of subpopulations based on co-expression patterns of two genes of interest and provides statistical assessment of the optimal number of components in heterogeneous cell populations.
Genome annotation with Maker
This workflow allows for genome annotation using Maker and evaluates the quality of the annotation.
lncRNAs annotation workflow
This workflow runs the FEELnc tool to annotate long non-coding RNAs. Before annotating these long non-coding RNAs, StringTie will be used to assemble the RNA-seq alignments into potential trancriptions. The gffread tool provides a genome annotation file in GTF format.
dada2 amplicon analysis pipeline - for paired end data
dada2 amplicon analysis for paired end data The workflow has three main outputs: - the sequence table (output of makeSequenceTable) - the taxonomy (output of assignTaxonomy) - the counts which allow to track the number of sequences in the samples through the steps (output of sequence counts)
RNA-Seq Analysis: Paired-End Read Processing and Quantification
Complete RNA-Seq analysis for paired-end data: Processes raw FASTQ data through adapter and bad quality removal (fastp), alignment with STAR using ENCODE parameters, gene quantification via multiple methods (STAR and featureCounts), and expression calculation (FPKM with Cufflinks/StringTie, normalized coverage with bedtools). Produces count tables, normalized expression values, and genomic coverage tracks. Supports stranded and unstranded libraries, generating both HTSeq-compatible counts and normalized measures for downstream analysis.
RNA-Seq Analysis: Single-End Read Processing and Quantification
Complete RNA-Seq analysis for single-end data: Processes raw FASTQ data through adapter and bad quality removal (fastp), alignment with STAR using ENCODE parameters, gene quantification via multiple methods (STAR and featureCounts), and expression calculation (FPKM with Cufflinks/StringTie, normalized coverage with bedtools). Produces count tables, normalized expression values, and genomic coverage tracks. Supports stranded and unstranded libraries, generating both HTSeq-compatible counts and normalized measures for downstream analysis.
Consensus Peak Calling for ChIP-seq Single-End Replicates
Identifies high-confidence consensus peaks from ChIP-seq single-end replicate experiments. The workflow calls peaks on individual replicates and identifies their intersection. To control for sequencing depth differences, it subsamples all replicates to the smallest library size, performs peak calling on the combined normalized data, and retains only peaks whose summits overlap with intersections from a user-defined minimum number of replicates.
Consensus Peak Calling for ChIP-seq Paired-End Replicates
Identifies high-confidence consensus peaks from ChIP-seq paired-end replicate experiments. The workflow calls peaks on individual replicates and identifies their intersection. To control for sequencing depth differences, it subsamples all replicates to the smallest library size, performs peak calling on the combined normalized data, and retains only peaks whose summits overlap with intersections from a user-defined minimum number of replicates.
Consensus Peak Calling for ATAC-seq and CUT&RUN Replicates
Identifies high-confidence consensus peaks from ATAC-seq or CUT&RUN replicate experiments. The workflow calls peaks on individual replicates and identifies their intersection. To control for sequencing depth differences, it subsamples all replicates to the smallest library size, performs peak calling on the combined normalized data, and retains only peaks whose summits overlap with intersections from a user-defined minimum number of replicates.
CUT&RUN/CUT&TAG Analysis: Protein-DNA Interaction Mapping
Complete CUT&RUN/CUT&TAG analysis workflow for paired-end sequencing data. Processes raw FASTQ files through adapter removal (cutadapt) and alignment (Bowtie2 with dovetail option enabled). Applies quality filtering (MAPQ ≥ 30, concordant pairs only), converts BAM to BED format, and performs peak calling using MACS2 with parameters optimized for the punctate signal profile characteristic of CUT&RUN/CUT&TAG experiments.
Clinical Metaproteomics Verification Workflow
In proteomics research, verifying detected peptides is essential for ensuring data accuracy and biological relevance. This tutorial continues from the clinical metaproteomics discovery workflow, focusing on verifying identified microbial peptides using the PepQuery tool.
Functional annotation of protein sequences
This workflow uses eggNOG mapper and InterProScan for functional annotation of protein sequences.
ATAC-seq Analysis: Chromatin Accessibility Profiling
Complete ATAC-seq analysis pipeline for paired-end reads. Processes raw FASTQ data through adapter and bad quality removal (cutadapt), alignment (Bowtie2 end-to-end), and filtering (removes MT reads, discordant pairs, low mapping quality <30, PCR duplicates). Generates 5' cut site pileups (±100bp), performs peak calling, and quantifies reads in 1kb summit-centered regions. Produces two normalized coverage tracks (per million mapped reads and per million reads in peaks) and fragment length distribution plots for quality assessment.
Clinical Metaproteomics Data Interpretation
This workflow will perform taxonomic and functional annotations using Unipept and statistical analysis using MSstatsTMT.
Clinical Metaproteomics Discovery Workflow
Workflow for clinical metaproteomics database searching
Generate a Clinical Metaproteomics Database
The workflow begins with the Database Generation process. The Galaxy-P team has developed a workflow that collects protein sequences from known disease-causing microorganisms to build a comprehensive database. This extensive database is then refined into a smaller, more relevant dataset using the Metanovo tool.
Single-Cell Pseudobulk Differential Expression Analysis with edgeR
Performs differential gene expression analysis on single-cell data using a pseudobulk approach. Aggregates cell-level counts from an annotated AnnData object by cell type or other metadata using Decoupler, then applies edgeR for robust statistical testing between conditions. Includes data preprocessing steps for compatibility with edgeR and generates interactive volcano plots to visualize significantly differentially expressed genes.
Segmentation and counting of cell nuclei in fluorescence microscopy images
This workflow performs segmentation and counting of cell nuclei using fluorescence microscopy images. The segmentation step is performed using Otsu thresholding (Otsu, 1979). The workflow is based on the tutorial: https://training.galaxyproject.org/training-material/topics/imaging/tutorials/imaging-introduction/tutorial.html
QIIME2 Ia: multiplexed data (single-end)
Importing single-end multiplexed data (not demultiplexed yet)
QIIME2 Ib: multiplexed data (paired-end)
Importing paired-end multiplexed data (not demultiplexed yet)
QIIME2 IIa: Denoising (sequence quality control) and feature table creation (single-end)
Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.
QIIME2 IIb: Denoising (sequence quality control) and feature table creation (paired-end)
Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.
QIIME2-III-V-Phylogeny-Rarefaction-Taxonomic-Analysis
This workflow - Reconstruct phylogeny (insert fragments in a reference) - Alpha rarefaction analysis - Taxonomic analysis
QIIME2 VI: Diversity metrics and estimations
The first step in hypothesis testing in microbial ecology is typically to look at within- (alpha) and between-sample (beta) diversity. We can calculate diversity metrics, apply appropriate statistical tests, and visualize the data using the q2-diversity plugin.
Gene Ontology and KEGG Pathway Enrichment Analysis
Performs functional enrichment analysis of gene sets using GOseq, identifying over-represented Gene Ontology terms and KEGG pathways. This workflow accounts for gene length bias in RNA-seq data when calculating enrichment statistics. Input requires differentially expressed genes and gene lengths. Generates comprehensive tables and visualizations of enriched GO terms across all three ontologies (Biological Process, Molecular Function, Cellular Component) as well as KEGG pathway enrichment results.
Paired end variant calling in haploid system
Workflow for variant analysis against a reference genome in GenBank format
AMR Gene Detection
Antimicrobial resistance gene detection from assembled bacterial genomes
Single-Cell RNA-seq Analysis: Scanpy Preprocessing and Clustering
End-to-end analysis of single-cell RNA-seq data using the Scanpy/AnnData ecosystem. Imports count matrices, applies quality control filtering of low-quality cells and genes, normalizes and scales data, performs dimension reduction (PCA), and identifies cell clusters using the Louvain algorithm. Generates publication-ready visualizations including UMAP plots colored by clusters and marker genes.
BREW3R
Extends 3' ends of gene annotations using BAM files (from STAR alignments) and a reference GTF. Specifically designed for 3'-biased sequencing techniques like 10X scRNA-seq or BRB-seq that primarily capture transcript 3' ends. The BREW3R tool enhances annotations by using evidence from RNA-seq data to improve 3' UTR definitions, which is particularly important for accurate quantification in single-cell and bulk RNA-seq experiments.
COVID-19: variation analysis reporting
This workflow takes a VCF dataset of variants produced by any of the *-variant-calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.
Clinical Metaproteomics Quantitation
Clinical Metaproteomics 4: Quantitation
Scaffolding-BioNano-VGP7
Scaffold a genome assembly using Bionano optical maps data. Part of the VGP suite, it can be run on the assembly in GFA format generated by the contigging step in workflow 3,4, or 5. If you used a different method to generate your assembly, you can use the tools gfastats in Galaxy to generate a GFA from your assembly fasta, and create a simple text file containing the estimated genome size.
sra_manifest_to_concatenated_fastqs_parallel
This workflow takes as input a SRA_manifest from SRA Run Selector and will generate one fastq file or fastq pair of file for each experiment (concatenated multiple runs if necessary). Output will be relabelled to match the column specified by the user.
MetaProSIP OpenMS 2.8
Automated inference of stable isotope incorporation rates in proteins for functional metaproteomics
Parallel Accession Download
Downloads fastq files for sequencing run accessions provided in a text file using fasterq-dump. Creates one job per listed run accession.
Pathogen Detection PathoGFAIR Samples Aggregation and Visualisation
Pathogens of all samples report generation and visualization
Gene-based Pathogen Identification
Nanopore datasets analysis - Phylogenetic Identification - antibiotic resistance genes detection and contigs building
RNA Velocity Analysis: Velocyto for 10X Data from Bundled Output
Processes 10X Genomics single-cell RNA-seq data using Velocyto to quantify spliced and unspliced transcript counts for RNA velocity analysis. Automatically extracts cell barcodes from standard 10X bundled output and generates a loom file containing separate counts for spliced exons, unspliced introns, and ambiguous regions. Enables downstream trajectory inference and cellular dynamics analysis.
RNA Velocity Analysis: Velocyto for 10X Data with Filtered Barcodes
Processes 10X Genomics single-cell RNA-seq data using Velocyto to quantify spliced and unspliced transcript counts for RNA velocity analysis. Takes pre-filtered cell barcodes as input and generates a loom file containing separate counts for spliced exons, unspliced introns, and ambiguous regions. Enables downstream trajectory inference and cellular dynamics analysis in tools like scVelo.
Create GRO and TOP complex files
Fragment-based virtual screening using rDock for docking and SuCOS for pose scoring
Virtual screening of the SARS-CoV-2 main protease with rDock and pose scoring
dcTMD calculations with GROMACS
Perform dcTMD free energy simulations and calculations
COVID-19: variation analysis on WGS PE data
This workflows performs paired end read mapping with bwa-mem followed by sensitive variant calling across a wide range of AFs with lofreq
COVID-19: variation analysis of ARTIC ONT data
This workflow for ONT-sequenced ARTIC data is modeled after the alignment/variant-calling steps of the [ARTIC pipeline](https://artic.readthedocs.io/en/latest/). It performs, essentially, the same steps as that pipeline’s minion command, i.e. read mapping with minimap2 and variant calling with medaka. Like the Illumina ARTIC workflow it uses ivar for primer trimming. Since ONT-sequenced reads have a much higher error rate than Illumina-sequenced reads and are therefor plagued more by false-positive variant calls, this workflow does make no attempt to handle amplicons affected by potential primer-binding site mutations.
Generic variation analysis reporting
This workflow takes a VCF dataset of variants produced by any of the variant calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.
Generic variation analysis on WGS PE data
Workflow for variant analysis against a reference genome in GenBank format
BigWig Replicates Averaging Workflow
Calculates average signal values across replicate BigWig files to generate a single consolidated track. Requires input file identifiers formatted as "sample_name_replicateID". The tool automatically groups files by sample name, computes the mean signal at each position, and outputs consolidated BigWig files named after the common sample prefix (without the replicate ID suffix).
Repeat masking with RepeatModeler and RepeatMasker
Hi-C Processing: FASTQ to Balanced Cool Files
Comprehensive Hi-C data processing workflow that converts paired-end FASTQ files to balanced contact matrices. Uses HiCUP for pre-processing (adapter trimming, mapping, filtering) with fragment midpoint coordinate assignment, filters reads by mapping quality, and generates sorted tabix files. The final output is a balanced cool file at user-specified resolution created with cooler, ready for multi-resolution contact matrix analysis.
Capture Hi-C Processing: FASTQ to Balanced Cool Files
End-to-end processing of Capture Hi-C paired-end sequencing data. Transforms raw FASTQ files into balanced contact matrices using HiCUP for processing Hi-C reads (adapter trimming, mapping, filtering for valid pairs) with additional filtering for MAPQ and captured regions. The workflow generates sorted tabix files via cooler and produces balanced cool files at user-specified resolution for downstream analysis and visualization.
Hi-C Format Conversion: Juicer Medium to Cooler Files
Converts Hi-C interaction data from Juicer medium format (tabix files) to balanced cooler format. This streamlined workflow takes a collection of Juicer medium tabix files and a reference genome name as input and generates balanced cool files at user-specified resolution. Enables seamless transition between different Hi-C analysis ecosystems while maintaining data integrity.
Hi-C Data Processing: FASTQ to Valid Interaction Pairs
Processes Hi-C paired-end FASTQ files to generate validated interaction pairs using HiCUP. The workflow truncates reads at ligation junctions, maps to reference genome, assigns to restriction fragments, and filters out experimental artifacts (self-ligated, dangling ends, internal fragments, and size outliers). Removes duplicates and converts outputs to formats compatible with Juicebox/cooler using fragment midpoints as coordinates. Final filtering by mapping quality ensures high-confidence interaction data.