RNASeq Quantification Pipeline
Was this helpful?
Was this helpful?
This RNASeq Pipeline aligns sequencing reads (single or paired end), sorts and indexes the alignment (.bam), counts features, and conducts a differential gene expression analysis.
The Pipeline uses the following four Apps Library Capsules:
STAR Alignment
Sambamba Sort & Index
FeatureCounts
DESeq2
Code Ocean has supplied the datasets needed to run the Pipeline on the codeocean-public-data S3 bucket. Create a Data Asset from the public S3 bucket below. For more details, see .
Example Sequencing Reads
Bucket Name: codeocean-public-data
Path:example_datasets/Normox
hg38 Annotation
Bucket Name - codeocean-public-data
Path - genomes/hg38_Annotation
hg38 Star Index
Bucket Name - codeocean-public-data
Path - example_datasets/STAR_GRCh38_GENCODE_Release_21_Index/star_index/
Click Manage Data Assets
Attach STAR Index, Annotation, and Example Sequencing Reads Data Assets.
The design matrix specifies metadata associated with the samples, i.e. tumor vs normal, tissue type, etc. In order to create the design matrix:
Create a Folder named DesignMatrix
Create metadata.csv
with the following 3 columns:
Run
Condition
Batch
Run should match the prefix for the .bam file output for the sample. Condition and Batch should indicate any metadata conditions to take into account to differentiate the samples in DESeq2.
Add the following Code Ocean Apps Capsules to the Pipeline Builder area:
STAR Alignment
Sambamba Sort & Index
FeatureCounts
DESeq2
In order to configure the connection for each step:
ReadsDataset to STAR Alignment is set to Default.
HG38 Star Index to STAR Alignment is set to Collect.
STAR Alignment to Sambamba Sort & Index is set to Default.
Sambamba Sort & Index to FeatureCounts is set to Collect.
Annotation Data Asset to FeatureCounts is set to Default.
FeatureCounts to DESeq2 is set to Default.
Set the destination to Counts_data.
Create and attach DesignMatrix - metadata.csv
to DESeq2.
Select Connection to Default, set the destination to Counts_data.
On the App Builder tab, click Create App. Click Create App again if prompted. Click Finish.
Configure the App Panel as follows.
Reference READMEs in Capsules to find out more about the parameters used.
nextflow
Consists of logs describing actions of nextflow.
DESeq2_results.csv
.csv file with the results table.
MA_plot.png
MA plots display a log ratio (M) vs an average (A) in order to visualize the differences between two groups. In general, we expect the expression of genes to remain consistent between conditions, so the MA plot should be similar to the shape of a trumpet with most points residing on a y intercept of 0.
PCA.png
Visualize how the samples group by treatment.
volcano_plot.png
The volcano plot enables it to simultaneously capture the effect size and significance of each tested gene.
plots_by_gene
A folder containing a file for each gene that plots the normalized counts for a single gene to get an idea of what is occurring for that gene across the sample cohort.
Click Settings