Types of Data Assets
Last updated
Was this helpful?
Last updated
Was this helpful?
An internal dataset is a copy of the dataset on Code Ocean in the virtual private cloud deployment. This is achieved by or , for example, AWS or Google Cloud. The data will be saved on Code Ocean's server. Authorized users can download these from the Data Assets page and access or attach them in a capsule. These assets are saved on S3 and are cached on EFS for quick access when they are actively being used.
. To establish the link, the AWS credentials must be provided during setup (see for details). Only the link will be saved on Code Ocean's server. Only authorized users will have to provide the credentials for using the external dataset in the capsule. Since the data is not saved in Code Ocean, it cannot be directly downloaded. Workflows that use external data assets cannot be guaranteed to be reproducible. External data assets must point to a top-level directory, not individual files.
A Captured Result is a data asset created from . It records the origin of this result, including the capsule code version, type of run, input data assets, and lineage graph. These assets are saved on S3 and are cached on EFS for quick access when they are actively being used.
External Data Assets can be created from Capsule or Pipeline results which are located on a spedified External S3 Bucket. External Data Assets automatically generate lineage and provenance.