nf-core RNASeq Tutorial
Was this helpful?
Was this helpful?
This tutorial demonstrates how to run on Code Ocean. nf-core/rnaseq is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation. It takes a samplesheet and FASTQ files as input, performs quality control (QC), trimming and (pseudo-)alignment, and produces a gene expression matrix and extensive QC report.
First, create an Internal Data Asset of the sequencing reads. This Data Asset can be imported from the public S3 bucket with the following bucket name and path:
We shall use the following Data Asset to demonstrate.
Bucket Name: codeocean-public-data
Path: example_datasets/Normox
Bucket Name: codeocean-public-data
Path: genomes/hg38/Reference/
Bucket Name: codeocean-public-data
Path: genomes/hg38_Annotation
From the Sidebar, create a new Pipeline by Import from nf-core
Search for rnaseq and v3.14.0
Click on Import to import the pipeline into your deployment.
Once the pipeline has been imported you'll be greeted with its README file
Click on Manage Data Assets
Search and Attach the following 3 Data Assets:
Normox-Sequencing
Gencode v42 Basic Annotation
hg38 Reference Sequence
Edit the sample sheet at /pipeline/assets/samplesheet.csv
to specify the sample names, location of read 1 and read 2 (if paired end), and strandedness. The strandedness refers to the library preparation and will be automatically inferred if set to auto
. Must be one of unstranded
, forward
, reverse
or auto
. Rows with the same sample identifier are considered technical replicates and merged automatically.
In the App Panel, update the Input,Fasta and Gtf parameters according to the location of your data.
Click on Run or Run with Parameters
To use External Data Assets in a Pipeline, must be configured. If this isn't configured in your deployment or you're unsure if it is, contact your Code Ocean admin.
Delete the value of the Igenomes Base parameter as those resources are not used in this tutorial. See for instructions on how to use iGenomes resources.
The Results are available in the Pipeline Timeline and a can be created for downstream processing.