Scratch
Working with Intermediate Data in Code Ocean
The scratch folder is a dedicated folder mounted to the capsule that ensures large intermediate data can be easily used in Code Ocean. It functions differently during Cloud Workstation sessions versus Reproducible Runs. In both cases, the scratch folder is mounted EFS storage that is practically unlimited in size.
The scratch folder is a dedicated folder mounted to the capsule that ensures large intermediate data can be easily used in Code Ocean. It functions differently during Cloud Workstation sessions versus Reproducible Runs. In both cases, the scratch folder is mounted EFS storage that is practically unlimited in size and can be accessed using the absolute path /root/capsule/scratch or from within the code folder using the relative path ../scratch.
When launching a cloud workstation, working with a large volume of data can significantly affect the performance of the capsule. Disk space is limited and copying large volumes of data back and forth between the cloud workstation and capsule is time-consuming and not recommended.
Cloud Workstation Scratch
For Cloud Workstation (CW) sessions the scratch folder is a mounted drive whose contents will persist throughout the lifetime of the capsule. Files written to scratch during a CW session will be visible in the capsule IDE after the session is shutdown and will be available in all subsequent sessions unless deleted by the user. These files will not be available during a Reproducible Run.
The scratch folder is a convenient location to store large data before creating a data asset. Contents of the scratch folder can be made into a data asset from the capsule IDE or during a CW session.
The scratch folder is a convenient location to store large data before creating a data asset. Contents of the scratch folder can be made into a data asset from the capsule IDE or during a CW session.
Files written to scratch during a CW session will be visible in the capsule IDE after the session is shutdown and will be available in all subsequent sessions unless deleted by the user. These files will not be available during a Reproducible Run.
Reproducible Run Scratch
For Reproducible Runs the scratch folder functions as a temporary folder that is empty at the start of the run and will be emptied at the end of the run. The capsule workspace (i.e. the core files excluding data assets) is limited to 5GB and therefore a Reproducible Run will fail if this limit is exceeded by creating new files during the run. The scratch folder can be used during a run to safely create files or work with intermediate data of any size. Since the folder is emptied before the end of each run any results must be moved to the results folder.
Scratch folder in Reproducible Run
For Reproducible Runs the scratch folder functions as a temporary folder that is empty at the start of the run and will be emptied at the end of the run. The capsule workspace (i.e. the core files excluding data assets) is limited to 5GB and therefore a Reproducible Run will fail if this limit is exceeded by creating new files during the run.
The scratch folder can be used during a run to safely create files or work with intermediate data of any size. Since the folder is emptied before the end of each run any results must be moved to the results folder.
Last updated
Was this helpful?