Genomics
Bulk Sequencing
STAR Generate Genome Index capsule
Generates necessary files to run STAR RNA alignment.
Genome DNA .fasta
Genome gene annotation .gtf/.gff
STAR Alignment
RNA-Seq alignment. STAR addresses many of the challenges of RNA-seq data mapping by accounting for spliced alignments. This means that RNA sequences can successfully align to the DNA genome.
Short/long read .fastq
STAR Index
Salmon Preparing Transcriptome Indices for Mapping-Based Mode
Generates necessary files to run Salmon RNA alignment from genome RNA transcript fasta file and genome DNA genome fasta file.
Genome DNA .fasta
Transcripts RNA .fasta
Salmon: mapping-based quantification
RNA-Seq quantification. Salmon specifically is designed for speed and is more geared towards quantification of transcripts specifically than precise read alignment.
Short/long read .fastq
Salmon Index
BWA Generate Genome Index
Generates necessary files to run BWA DNA alignment from a DNA fasta file.
Genome DNA .fasta
BWA Mem
BWA is a software package for mapping sequences against a large reference genome, such as the human genome.
Short/long read .fastq (designed for short reads)
BWA Index
Bowtie2 Generate Genome Index
Generates necessary files to run Bowtie DNA alignment from a DNA fasta file.
Genome DNA .fasta
Bowtie2
Bowtie is a software package for mapping sequences against a large reference genome, such as the human genome.
Short/long read .fastq (designed for short reads)
Bowtie2 Index
Single Cell
STAR-Solo Alignment
STAR-Solo analyzes droplet single cell RNA sequencing data for example, 10X Genomics Chromium System. It is intended to be a drop in replacement for CellRanger from 10X.
Single cell RNA-seq .fastq
STAR Index
RShiny Cell
ShinyCell is an R package that allows users to create interactive Shiny-based web applications to visualize single-cell data.
Single cell .rds inputs from Seurat (see README)
1-3. Single Cell Analysis Tutorial (Scanpy & Seurat)
Tutorials to describe working with Single Cell data for Scanpy and Seurat:
1. Preprocessing and clustering 3k PBMCs
2. Core Plotting Functions
3. How to preprocess UMI count data with analytic Pearson residuals
Tutorial datasets (see README for details)
4. Single Cell Tutorial Seurat to AnnData (Scanpy) tutorial
Tutorial demonstrating an example of how a Seurat object can easily be converted to AnnData (Scanpy).
Tutorial datasets (see README for details)
5-6. Single Cell Analysis Tutorial (Scanpy)
Tutorials demonstrating how to regress cell cycle effect and how to simulate data using a literature-curated boolean gene regulatory network.
Tutorial datasets (see README for details)
7-10. Single Cell Analysis Tutorial (Scanpy) Advanced
Tutorials for advanced Single Cell processing.
Tutorial datasets (see README for details)
Utilities
Download data from BaseSpace
Download demultiplexed (fastq.gz) or raw (bcl) Illumina sequencing data through the Illumina BaseSpace CLI. This capsule requires a BaseSpace account and NGS data owned or shared with the user.
None
Sambamba Filtering (Duplicates, Multimappers, Unaligned)
Remove optical and PCR duplicates from Illumina data using the software tool Sambamba. Sambamba is intended to be a drop in replacement for Picard MarkDuplicates but more performant.
.bam alignment files.
Sambamba Sort and Index
Sort and Index Illumina data using the software tool Sambamba. Sambamba is intended to be a drop in replacement for samtools but more performant.
.bam alignment files.
Trim Galore
Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.
.fastq files
fastp
A tool designed to provide fast all-in-one preprocessing for FastQ files (adapter trimming, downsampling etc.). This tool is developed in C++ with multithreading supported to afford high performance.
.fastq files
Other
MACS PeakCalling
MACS3 is a peak calling tool generally used on ChIP seq data to identify transcript factor binding sites.
.bam alignment files
compare_sheet.csv (see README)
featureCounts
This capsule will run featureCounts from the Subreads R package to generate an expression matrix.
Gene annotation .gtf file
.bam alignments
HOMER
Homer contains a useful, all-in-one program for performing peak annotation called annotatePeaks.pl. This capsule uses annotatePeaks.pl to annotate *.bed coordinates with gene features.
.bed files containing peaks
Genome reference .fasta
Gene annotation .gtf file.
Gene Enrichment Analysis (GEA)
This capsule presents a user-friendly Streamlit application designed to facilitate gene enrichment analysis. The analysis results are sourced from reliable and widely-used platforms, namely g-profiler and Panther.
File containing gene names
GATK RNAseq short variant discovery (SNPs + Indels)
Based on GATK RNASeq short variant discovery pipeline. Takes in alignments and outputs vcf containing SNPs and indels.
.bam RNA alignments
Delly somatic complete analysis
Structural variant (SV) prediction to discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data of somatic cells.
Genome reference .fasta
.bam DNA alignment files
Delly germline complete analysis
Structural variant (SV) prediction to discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data of germline cells.
Genome reference .fasta
.bam DNA alignment files
ART-Simulation-Illumina
ART is a set of simulation tools to generate synthetic next-generation sequencing reads.
.fasta containing the sequence to simulate reads from
PySpark and EMR Serverless
This capsule runs an example PySpark job on EMR Serverless.
NOAA Global Surface Summary of Day dataset
Was this helpful?