Data

This section explains how to use data in a Pipeline, including external Data Assets, and how to remove or replace data.

Was this helpful?

Adding Data to a Pipeline

You can add data to your Pipeline by attaching a Data Asset from the Manage button next to the /data folder or uploading data.

Once your data are in the File Tree system, you can drag and drop the folder into the Pipeline editor and connect it to a Capsule.

Uploaded (local) data files must be placed inside a folder before they can be dropped into the Pipeline editor. Using local files will increase run time compared to internal Data Assets. It is best practice to use Data Assets instead of local files because using Data Assets allows the same Data Asset to be easily reused in multiple Pipelines and Capsules, and enhances reproducibility by ensuring an accurate Lineage Graph.

The Pipeline will only use data that is dragged onto the Pipeline editor and connected to a Capsule.

Removing or Replacing Data from a Pipeline

The replace feature can be used to substitute a Data Asset while maintaining all connections and mappings.

Data Assets must be attached to the Pipeline's /data folder to appear in the Replace Data menu.

Using External or Combined Data Assets

To remove a Data Asset from the Pipeline editor you can hover over it and click the garbage can icon .

External Data Assets and Combined Data Assets can be used in a Pipeline. To use Data Assets linked to an AWS S3 bucket, a must be selected in the Pipeline Settings menu, regardless of whether the S3 bucket is private or public. See for more information.

custom IAM role

Pipeline Settings