The Structure of a Compute Capsule
Was this helpful?
Was this helpful?
The Code Ocean Computational Workbench is divided into three vertical sections from left to right, files, editor, and timeline.
The user interface of the compute capsule IDE (integrated development environment):
The File tab on the toolbar to the left opens the Files panel. Under Core Files, metadata, environment, code, and folders are organized. Under Research Data Drive, is the scratch folder, and Under Results is the result folder, additional files are organized under Other Files.
The editor is the center panel where you can view and edit information that you select from the Files folders. Results files can be selected from the Reproducibility Panel on the right.
The timeline is the Reproducibility panel on the right which provides a managed history of the Capsule and prompts commit changes and manage versions.
These are the essential research project components:
Metadata
Capsule's metadata
metadata.yml
Environment
Capsule's environment files
Dockerfile, postInstall script, environment.yml
Code
Code and related files
Source files
Data
Input data required by reproducible runs (RR). May optionally contain small intermediate data
Small data files and attached datasets
Results
Output files generated by reproducible runs
Result files
Scratch (CW)
Large Intermediate data that NOT required by reproducible runs
Large data files
Scratch (RR)
Large Intermediate data that generated during reproducible runs
Large data files
CW Root FS
Package installations, IDE preferences, temporary files, etc.
The whole file system except the capsule workspace, and those mounted folders
These appear in the Files panel as six folders. You can download and upload content to and from all the folders, except the results and scratch folder.
Follow the articles below to learn more details about the components:
Below are the details of the size limitation, the path of the folder in the system, design concepts, and the underline mechanism for each of the folders.
The limitation of the capsule workspace is 5GB, this includes the files in Metadata, Environment, Code, and Data folder.
Metadata
Environment
Code
Data
The limit doesn't apply to attached datasets as they are mounted to the capsule
Results
Results is mounted and not counted as capsule's workspace
Scratch (CW) Scratch (RR)
Scratch is mounted and not counted as capsule's workspace
CW Root FS
5GB limit, including existing capsule environment installations
The capsule size limit has been added to the bottom of the Files panel. To see the color-coded breakdown and informational text, hover the mouse cursor over Capsule Limit.
Capsule Limit and Root Limit have also been added in Cloud Workstations.
Metadata
/metadata
/root/capsule/metadata
-
Environment
/environment
/root/capsule/environment
-
Code
/code
/root/capsule/code
/code
Data
/data
/root/capsule/data
/data
Results
-
/root/capsule/results
/restuls
Scratch (CW)
-
/root/capsule/scratch
/scratch
Scratch (RR)
/scratch
CW Root FS
-
/
-
The scratch (RR) is not the same folder as scratch (CW). The scratch (RR) is emptied before the end of each run, the content will not be visible in the capsule IDE.
Metadata
EBS (local storage)
Capsule lifetime
Environment
EBS (local storage)
Capsule lifetime
Code
EBS (local storage)
Capsule lifetime
Data
Local data: EBS
Internal dataset: mounted from EFS
External dataset: mounted from S3
Local data: Capsule lifetime
Datasets: has it's own lifetime
Results
During computation: EBS
When the computation finished, upload to S3
Capsule lifetime unless explicitly deleted from the timeline
Scratch
Mounted from EFS
Capsule lifetime
CW Root FS
EBS (local storage)
Cloud Workstation session
The source and the size of the Data might vary depending on the project. As covered in the previous table, there are different ways of bringing data into the capsule in the Data folder. Given the complexity of the project, you may need a place to store the intermediate results and use them as input data for further analysis, the Scratch folder is used for this purpose.
Below is the shareability of each type of dataset and or recommended practice of implementing it in the capsule based on the design logic.
Data
local (directly upload to capsule)
Only current capsule
Small or example dataset to test the capsule
Data
Internal Dataset
Across capsule
The dataset will be saved in VPC's AWS storage. Works well with immutable data that only need to import to Code Ocean's VPC once.
Data
External Dataset
Across capsule
The dataset will need an AWS credential to access. Works well with a confidential dataset. Dataset can be changed if the source changed
Scratch (CW)
-
Only current capsule
Access this only in the Cloud Workstation for storing the intermediate large dataset/output file. Usually will be converted into an internal dataset for sharing across the capsule and for downstream analysis
Scratch (RR)
-
Only current run
Temporary storage during reproducible for large data that might exceed the capsule's size limit
Note: A Cloud Workstation can be shut down, or put on hold from the capsule Dashboard.
We recommend installing all the packages you need during the building phase. Please check the for more information.