From the Map Paths menu, the source and destination paths can be changed and a Connection Type can be selected. Configuring this menu properly will ensure that each capsule receives the necessary data and that the pipeline is optimized for parallelization.

This section will cover the following:

Changing Source and Destination Paths

Users can customize the flow of data in a pipeline by specifying source and destination paths. Source paths define which files should be transferred to the destination capsule, while destination paths specify where these files should be stored in the destination capsule. Multiple mappings can be used to provide additional configuration.

For example, Capsule A generates many different types of files in its results folder. Using the following source and destination mappings, all files with the extension .zip and .html will be sent to Capsule B's data folder in folders called zip_files and html_files, respectively. Any file without a .zip or .html extension will be ignored by Capsule B.

Connection Type Definitions

For the purposes of this guide, an "item" refers to a file or folder, as they are treated the same by each connection type.

Default

Data asset to capsule: each item will be distributed to a parallel instance of the capsule.

Capsule to capsule: a destination capsule instance will be executed for every instance of the source capsule.

Collect

Data asset to capsule: the entire data asset will be made available to all parallel instances of the destination capsule.

Capsule to capsule: the source data will be made available as a whole to all parallel instances of the destination capsule.

If there is only one input (data asset or capsule) to the destination capsule and the Connection Type is Collect, there will only be one instance of the destination capsule.
If the source data consists of a single item which is needed by all instances of the destination capsule, Collect should be used. Otherwise only one instance of the destination capsule will receive the source data.
Collect was formerly Global.

Flatten

Data asset to capsule: same as Default

Capsule to capsule: the source data will be split such that every item is passed separately into each parallel instance of the destination capsule.

Data Asset to Capsule Connections

This example shows how items from two data assets will be distributed across parallel instances of a capsule.

The left side shows the pipeline schematic where two data assets are connected to a single capsule. Each data asset contains 3 items; nums contains 3 files and alpha contains 2 files and 1 folder.

On the right is a table showing the distribution of input items across parallel capsule instances for each connection type combination.

Special Case: two data assets containing an unequal number of items, both with Default (or Flatten) selected

Parallel capsule instances will be created based on the number of items in the data asset with fewer items. Items from the other data asset will be randomly distributed to parallel instances, with extra items being left out of the computation.

Capsule to Capsule Connections

This example shows how results generated by a source capsule (Capsule A) affect the execution of a destination capsule (Capsules B-F) when different connection types are used. It also shows how source mappings can be used in combination with connection types to further customize the pipeline execution.

In this scenario, there are 3 parallel instances of Capsule A each producing the same output. The number of capsule icons represents the number of parallel instances.

Organizing Results

Write Files to a Folder in the Results Bucket

When many capsules are connected to the Results Bucket, it's helpful to write each capsule's output to a uniquely named folder. This can be achieved by opening the Map Paths menu and adding a folder name to the destination path.

For example, Capsule B's results will be written to a folder called Capsule_B:

If there are multiple instances of Capsule B each producing results with the same name, they will overwrite each other in the Results Bucket unless Generate indexed folders is used.

Generate Indexed Folders

The Map Paths menu between a capsule and the Results Bucket has a Generate indexed folders switch. By default, files will be written to the Results Bucket in the same way they’re written to the capsule’s results folder. Turning on generate indexed folders will write files to a unique folder for each instance of the source capsule.

For example, if there are three instances of the source capsule and generate indexed folders is on, results from each instance will be written into separate folders named 1, 2, and 3.