GNU Parallel Tutorial

This tutorial shows off much of GNU parallel's functionality. The tutorial is meant to learn the options in and syntax of GNU parallel. The tutorial is not to show realistic examples from the real world.

Reader's guide

If you prefer reading a book buy GNU Parallel 2018 at https://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html or download it at: https://doi.org/10.5281/zenodo.1146014

Otherwise start by watching the intro videos for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Then browse through the examples (man parallel_examples). That will give you an idea of what GNU parallel is capable of.

If you want to dive even deeper: spend a couple of hours walking through the tutorial (man parallel_tutorial). Your command line will love you for it.

Finally you may want to look at the rest of the manual (man parallel) if you have special needs not already covered.

If you want to know the design decisions behind GNU parallel, try: man parallel_design. This is also a good intro if you intend to change GNU parallel.

Prerequisites

To run this tutorial you must have the following:

parallel >= version 20160822

Install the newest version using your package manager (recommended for security reasons), the way described in README, or with this command:
$(wget-O-pi.dk/3||lynx-sourcepi.dk/3||curlpi.dk/3/||\fetch-o-http://pi.dk/3)>install.sh $sha1suminstall.sh 1234567851621b7f1ee103c00783aae4ef9889f8 $md5suminstall.sh 62eada78703b5500241b8e50baf62758 $sha512suminstall.sh 160d31599480cf5ca101512f150b7ac0206a65dc86f2bb6bbdf1a2bc96bc6d06 7f8237c20964b67fbccf8a93332528fa11e5ab432a6226a6ceb197ab7f03c061 $bashinstall.sh 
This will also install the newest version of the tutorial which you can see by running this:
manparallel_tutorial 
Most of the tutorial will work on older versions, too.

abc-file:

The file can be generated by this command:
parallel-kecho:::ABC>abc-file 

def-file:

The file can be generated by this command:
parallel-kecho:::DEF>def-file 

abc0-file:

The file can be generated by this command:
perl-e'printf "A\0B\0C\0"'>abc0-file 

abc_-file:

The file can be generated by this command:
perl-e'printf "A_B_C_"'>abc_-file 

tsv-file.tsv

The file can be generated by this command:
perl-e'printf "f1\tf2\nA\tB\nC\tD\n"'>tsv-file.tsv 

num8

The file can be generated by this command:
perl-e'for(1..8){print "$_\n"}'>num8 

num128

The file can be generated by this command:
perl-e'for(1..128){print "$_\n"}'>num128 

num30000

The file can be generated by this command:
perl-e'for(1..30000){print "$_\n"}'>num30000 

num1000000

The file can be generated by this command:
perl-e'for(1..1000000){print "$_\n"}'>num1000000 

num_%header

The file can be generated by this command:
(echo%head1;echo%head2;\perl-e'for(1..10){print "$_\n"}')>num_%header 

fixedlen

The file can be generated by this command:
perl-e'print "HHHHAAABBBCCC"'>fixedlen 

For remote running: ssh login on 2 servers with no password in $SERVER1 and $SERVER2 must work.

SERVER1=server.example.com SERVER2=server2.example.net 
So you must be able to do this without entering a password:
ssh$SERVER1echoworks ssh$SERVER2echoworks 
It can be setup by running ssh-keygen -t dsa; ssh-copy-id $SERVER1 and using an empty passphrase, or you can use ssh-agent.

Input sources

GNU parallel reads input from input sources. These can be files, the command line, and stdin (standard input or a pipe).

A single input source

Input can be read from the command line:

parallelecho:::ABC 

Output (the order may be different because the jobs are run in parallel):

A B C

The input source can be a file:

parallel-aabc-fileecho

Output: Same as above.

STDIN (standard input) can be the input source:

catabc-file|parallelecho

Output: Same as above.

Multiple input sources

GNU parallel can take multiple input sources given on the command line. GNU parallel then generates all combinations of the input sources:

parallelecho:::ABC:::DEF 

Output (the order may be different):

AD AE AF BD BE BF CD CE CF 

The input sources can be files:

parallel-aabc-file-adef-fileecho

Output: Same as above.

STDIN (standard input) can be one of the input sources using -:

catabc-file|parallel-a--adef-fileecho

Output: Same as above.

Instead of -a files can be given after :::::

catabc-file|parallelecho::::-def-file 

Output: Same as above.

::: and :::: can be mixed:

parallelecho:::ABC::::def-file 

Output: Same as above.

Linking arguments from input sources

With --link you can link the input sources and get one argument from each input source:

parallel--linkecho:::ABC:::DEF 

Output (the order may be different):

AD BE CF 

If one of the input sources is too short, its values will wrap:

parallel--linkecho:::ABCDE:::FG 

Output (the order may be different):

AF BG CF DG EF 

For more flexible linking you can use :::+ and ::::+. They work like ::: and :::: except they link the previous input source to this input source.

This will link ABC to GHI:

parallelecho::::abc-file:::+GHI::::def-file 

Output (the order may be different):

AGD AGE AGF BHD BHE BHF CID CIE CIF 

This will link GHI to DEF:

parallelecho::::abc-file:::GHI::::+def-file 

Output (the order may be different):

AGD AHE AIF BGD BHE BIF CGD CHE CIF 

If one of the input sources is too short when using :::+ or ::::+, the rest will be ignored:

parallelecho:::ABCDE:::+FG 

Output (the order may be different):

AF BG

Changing the argument separator.

GNU parallel can use other separators than ::: or ::::. This is typically useful if ::: or :::: is used in the command to run:

parallel--arg-sep,,echo,,ABC::::def-file 

Output (the order may be different):

AD AE AF BD BE BF CD CE CF 

Changing the argument file separator:

parallel--arg-file-sep//echo:::ABC//def-file 

Output: Same as above.

Changing the argument delimiter

GNU parallel will normally treat a full line as a single argument: It uses \n as argument delimiter. This can be changed with -d:

parallel-d_echo::::abc_-file 

Output (the order may be different):

A B C

NUL can be given as \0:

parallel-d'\0'echo::::abc0-file 

Output: Same as above.

A shorthand for -d '\0' is -0 (this will often be used to read files from find ... -print0):

parallel-0echo::::abc0-file 

Output: Same as above.

End-of-file value for input source

GNU parallel can stop reading when it encounters a certain value:

parallel-Estopecho:::ABstopCD 

Output:

A B

Skipping empty lines

Using --no-run-if-empty GNU parallel will skip empty lines.

(echo1;echo;echo2)|parallel--no-run-if-emptyecho

Output:

Building the command line

No command means arguments are commands

If no command is given after parallel the arguments themselves are treated as commands:

parallel:::ls'echo foo'pwd

Output (the order may be different):

[listoffilesincurrentdir] foo [/path/to/current/working/dir]

The command can be a script, a binary or a Bash function if the function is exported using export -f:

# Only works in Bash my_func(){echoinmy_func$1}export-fmy_func parallelmy_func:::123

Output (the order may be different):

inmy_func1inmy_func2inmy_func3

Replacement strings

The 7 predefined replacement strings

GNU parallel has several replacement strings. If no replacement strings are used the default is to append {}:

parallelecho:::A/B.C 

Output:

A/B.C

The default replacement string is {}:

parallelecho{}:::A/B.C 

Output:

A/B.C

The replacement string {.} removes the extension:

parallelecho{.}:::A/B.C 

Output:

A/B

The replacement string {/} removes the path:

parallelecho{/}:::A/B.C 

Output:

B.C

The replacement string {//} keeps only the path:

parallelecho{//}:::A/B.C 

Output:

The replacement string {/.} removes the path and the extension:

parallelecho{/.}:::A/B.C 

Output:

The replacement string {#} gives the job number:

parallelecho{#} ::: A B C

Output (the order may be different):

123

The replacement string {%} gives the job slot number (between 1 and number of jobs to run in parallel):

parallel-j2echo{%}:::ABC 

Output (the order may be different and 1 and 2 may be swapped):

121

Changing the replacement strings

The replacement string {} can be changed with -I:

parallel-I,,echo,,:::A/B.C 

Output:

A/B.C

The replacement string {.} can be changed with --extensionreplace:

parallel--extensionreplace,,echo,,:::A/B.C 

Output:

A/B

The replacement string {/} can be replaced with --basenamereplace:

parallel--basenamereplace,,echo,,:::A/B.C 

Output:

B.C

The replacement string {//} can be changed with --dirnamereplace:

parallel--dirnamereplace,,echo,,:::A/B.C 

Output:

The replacement string {/.} can be changed with --basenameextensionreplace:

parallel--basenameextensionreplace,,echo,,:::A/B.C 

Output:

The replacement string {#} can be changed with --seqreplace:

parallel--seqreplace,,echo,,:::ABC 

Output (the order may be different):

123

The replacement string {%} can be changed with --slotreplace:

parallel-j2--slotreplace,,echo,,:::ABC 

Output (the order may be different and 1 and 2 may be swapped):

121

Perl expression replacement string

When predefined replacement strings are not flexible enough a perl expression can be used instead. One example is to remove two extensions: foo.tar.gz becomes foo

parallelecho'{= s:\.[^.]+$::;s:\.[^.]+$::; =}':::foo.tar.gz 

Output:

foo

In {= =} you can access all of GNU parallel's internal functions and variables. A few are worth mentioning.

total_jobs() returns the total number of jobs:

parallelechoJob{#} of {= '$_=total_jobs()' =} ::: {1..5}

Output:

Job1of5 Job2of5 Job3of5 Job4of5 Job5of5

Q(...) shell quotes the string:

parallelecho{}shellquotedis{='$_=Q($_)'=}:::'*/!#$'

Output:

*/!#$shellquotedis\*/\!\#\$

skip() skips the job:

parallelecho{='if($_==3) { skip() }'=}:::{1..5}

Output:

1245

@arg contains the input source variables:

parallelecho{='if($arg[1]==$arg[2]) { skip() }'=}\:::{1..3}:::{1..3}

Output:

121321233132

If the strings {= and =} cause problems they can be replaced with --parens:

parallel--parens,,,,echo',, s:\.[^.]+$::;s:\.[^.]+$::; ,,'\:::foo.tar.gz 

Output:

foo

To define a shorthand replacement string use --rpl:

parallel--rpl'.. s:\.[^.]+$::;s:\.[^.]+$::;'echo'..'\:::foo.tar.gz 

Output: Same as above.

If the shorthand starts with { it can be used as a positional replacement string, too:

parallel--rpl'{..} s:\.[^.]+$::;s:\.[^.]+$::;'echo'{..}':::foo.tar.gz 

Output: Same as above.

If the shorthand contains matching parenthesis the replacement string becomes a dynamic replacement string and the string in the parenthesis can be accessed as $$1. If there are multiple matching parenthesis, the matched strings can be accessed using $$2, $$3 and so on.

You can think of this as giving arguments to the replacement string. Here we give the argument .tar.gz to the replacement string {%*string*} which removes string:

parallel--rpl'{%(.+?)} s/$$1$//;'echo{%.tar.gz}.zip:::foo.tar.gz 

Output:

foo.zip

Here we give the two arguments tar.gz and zip to the replacement string {/*string1*/*string2*} which replaces string1 with string2:

parallel--rpl'{/(.+?)/(.*?)} s/$$1/$$2/;'echo{/tar.gz/zip}\:::foo.tar.gz 

Output:

foo.zip

GNU parallel's 7 replacement strings are implemented as this:

--rpl'{} ' --rpl'{#} $_=$job->seq()' --rpl'{%} $_=$job->slot()' --rpl'{/} s:.*/::' --rpl'{//} $Global::use{"File::Basename"} ||= eval "use File::Basename; 1;"; $_ = dirname($_);' --rpl'{/.} s:.*/::; s:\.[^/.]+$::;' --rpl'{.} s:\.[^/.]+$::'

Positional replacement strings

With multiple input sources the argument from the individual input sources can be accessed with {number}:

parallelecho{1}and{2}:::AB:::CD 

Output (the order may be different):

AandC AandD BandC BandD 

The positional replacement strings can also be modified using /, //, /., and .:

parallelecho/={1/}//={1//}/.={1/.}.={1.}:::A/B.CD/E.F 

Output (the order may be different):

/=B.C//=A/.=B.=A/B /=E.F//=D/.=E.=D/E 

If a position is negative, it will refer to the input source counted from behind:

parallelecho1={1}2={2}3={3}-1={-1}-2={-2}-3={-3}\:::AB:::CD:::EF 

Output (the order may be different):

1=A2=C3=E-1=E-2=C-3=A 1=A2=C3=F-1=F-2=C-3=A 1=A2=D3=E-1=E-2=D-3=A 1=A2=D3=F-1=F-2=D-3=A 1=B2=C3=E-1=E-2=C-3=B 1=B2=C3=F-1=F-2=C-3=B 1=B2=D3=E-1=E-2=D-3=B 1=B2=D3=F-1=F-2=D-3=B 

Positional perl expression replacement string

To use a perl expression as a positional replacement string simply prepend the perl expression with number and space:

parallelecho'{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}'\:::bar:::foo.tar.gz 

Output:

foobar

If a shorthand defined using --rpl starts with { it can be used as a positional replacement string, too:

parallel--rpl'{..} s:\.[^.]+$::;s:\.[^.]+$::;'echo'{2..} {1}'\:::bar:::foo.tar.gz 

Output: Same as above.

Input from columns

The columns in a file can be bound to positional replacement strings using --colsep. Here the columns are separated by TAB (\t):

parallel--colsep'\t'echo1={1}2={2}::::tsv-file.tsv 

Output (the order may be different):

1=f12=f2 1=A2=B 1=C2=D 

Header defined replacement strings

With --header GNU parallel will use the first value of the input source as the name of the replacement string. Only the non-modified version {} is supported:

parallel--header:echof1={f1}f2={f2}:::f1AB:::f2CD 

Output (the order may be different):

f1=Af2=C f1=Af2=D f1=Bf2=C f1=Bf2=D 

It is useful with --colsep for processing files with TAB separated values:

parallel--header:--colsep'\t'echof1={f1}f2={f2}\::::tsv-file.tsv 

Output (the order may be different):

f1=Af2=B f1=Cf2=D 

More pre-defined replacement strings with --plus

--plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...} {/..} {/...} {##}. The idea being that {+foo} matches the opposite of {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.

parallel--plusecho{}:::dir/sub/file.ex1.ex2.ex3 parallel--plusecho{+/}/{/}:::dir/sub/file.ex1.ex2.ex3 parallel--plusecho{.}.{+.}:::dir/sub/file.ex1.ex2.ex3 parallel--plusecho{+/}/{/.}.{+.}:::dir/sub/file.ex1.ex2.ex3 parallel--plusecho{..}.{+..}:::dir/sub/file.ex1.ex2.ex3 parallel--plusecho{+/}/{/..}.{+..}:::dir/sub/file.ex1.ex2.ex3 parallel--plusecho{...}.{+...}:::dir/sub/file.ex1.ex2.ex3 parallel--plusecho{+/}/{/...}.{+...}:::dir/sub/file.ex1.ex2.ex3 

Output:

dir/sub/file.ex1.ex2.ex3

{##} is simply the number of jobs:

parallel--plusechoJob{#} of {##} ::: {1..5}

Output:

Job1of5 Job2of5 Job3of5 Job4of5 Job5of5

Dynamic replacement strings with --plus

--plus also defines these dynamic replacement strings:

{:-*string*}

Default value is string if the argument is empty.

{:*number*}

Substring from number till end of string.

{:*number1*:*number2*}

Substring from number1 to number2.

{#*string*}

If the argument starts with string, remove it.

{%*string*}

If the argument ends with string, remove it.

{/*string1*/*string2*}

Replace string1 with string2.

{^*string*}

If the argument starts with string, upper case it. string must be a single letter.

{^^*string*}

If the argument contains string, upper case it. string must be a single letter.

{,*string*}

If the argument starts with string, lower case it. string must be a single letter.

{,,*string*}

If the argument contains string, lower case it. string must be a single letter.

They are inspired from Bash:

unsetmyvar echo${myvar:-myval} parallel--plusecho{:-myval}:::"$myvar"myvar=abcAaAdef echo${myvar:2} parallel--plusecho{:2}:::"$myvar"echo${myvar:2:3} parallel--plusecho{:2:3}:::"$myvar"echo${myvar#bc} parallel--plusecho{#bc} ::: "$myvar"echo${myvar#abc} parallel--plusecho{#abc} ::: "$myvar"echo${myvar%de} parallel--plusecho{%de}:::"$myvar"echo${myvar%def} parallel--plusecho{%def}:::"$myvar"echo${myvar/def/ghi} parallel--plusecho{/def/ghi}:::"$myvar"echo${myvar^a} parallel--plusecho{^a}:::"$myvar"echo${myvar^^a} parallel--plusecho{^^a}:::"$myvar"myvar=AbcAaAdef echo${myvar,A} parallel--plusecho'{,A}':::"$myvar"echo${myvar,,A} parallel--plusecho'{,,A}':::"$myvar"

Output:

myval myval cAaAdef cAaAdef cAa cAa abcAaAdef abcAaAdef AaAdef AaAdef abcAaAdef abcAaAdef abcAaA abcAaA abcAaAghi abcAaAghi AbcAaAdef AbcAaAdef AbcAAAdef AbcAAAdef abcAaAdef abcAaAdef abcaaadef abcaaadef 

More than one argument

With --xargs GNU parallel will fit as many arguments as possible on a single line:

catnum30000|parallel--xargsecho|wc-l 

Output (if you run this under Bash on GNU/Linux):

The 30000 arguments fitted on 2 lines.

The maximal length of a single line can be set with -s. With a maximal line length of 10000 chars 17 commands will be run:

catnum30000|parallel--xargs-s10000echo|wc-l 

Output:

For better parallelism GNU parallel can distribute the arguments between all the parallel jobs when end of file is met.

Below GNU parallel reads the last argument when generating the second job. When GNU parallel reads the last argument, it spreads all the arguments for the second job over 4 jobs instead, as 4 parallel jobs are requested.

The first job will be the same as the --xargs example above, but the second job will be split into 4 evenly sized jobs, resulting in a total of 5 jobs:

catnum30000|parallel--jobs4-mecho|wc-l 

Output (if you run this under Bash on GNU/Linux):

This is even more visible when running 4 jobs with 10 arguments. The 10 arguments are being spread over 4 jobs:

parallel--jobs4-mecho:::12345678910

Output:

12345678910

A replacement string can be part of a word. -m will not repeat the context:

parallel--jobs4-mechopre-{}-post:::ABCDEFG 

Output (the order may be different):

pre-AB-post pre-CD-post pre-EF-post pre-G-post 

To repeat the context use -X which otherwise works like -m:

parallel--jobs4-Xechopre-{}-post:::ABCDEFG 

Output (the order may be different):

pre-A-postpre-B-post pre-C-postpre-D-post pre-E-postpre-F-post pre-G-post 

To limit the number of arguments use -N:

parallel-N3echo:::ABCDEFGH 

Output (the order may be different):

ABC DEF GH 

-N also sets the positional replacement strings:

parallel-N3echo1={1}2={2}3={3}:::ABCDEFGH 

Output (the order may be different):

1=A2=B3=C 1=D2=E3=F 1=G2=H3=

-N0 reads 1 argument but inserts none:

parallel-N0echofoo:::123

Output:

foo foo foo

Quoting

Command lines that contain special characters may need to be protected from the shell.

The perl program print "@ARGV\n" basically works like echo.

perl-e'print "@ARGV\n"'A 

Output:

To run that in parallel the command needs to be quoted:

parallelperl-e'print "@ARGV\n"':::Thiswontwork 

Output:

[Nothing]

To quote the command use -q:

parallel-qperl-e'print "@ARGV\n"':::Thisworks 

Output (the order may be different):

This works

Or you can quote the critical part using \':

parallelperl-e\''print "@ARGV\n"'\':::Thisworks,too 

Output (the order may be different):

This works, too

GNU parallel can also \-quote full lines. Simply run this:

parallel--shellquote Warning:Inputisreadfromtheterminal.Youeitherknowwhatyou Warning:aredoing(inwhichcase:YOUAREAWESOME!)oryouforgot Warning::::or::::ortopipedataintoparallel.Ifso Warning:considergoingthroughthetutorial:manparallel_tutorial Warning:PressCTRL-Dtoexit. perl-e'print "@ARGV\n"'[CTRL-D]

Output:

perl\ -e\ \'print\ \"@ARGV\\n\"\'

This can then be used as the command:

parallelperl\ -e\ \'print\ \"@ARGV\\n\"\':::Thisalsoworks 

Output (the order may be different):

This also works

Trimming space

Space can be trimmed on the arguments using --trim:

parallel--trimrechopre-{}-post:::' A '

Output:

pre-A-post

To trim on the left side:

parallel--trimlechopre-{}-post:::' A '

Output:

pre-A-post

To trim on the both sides:

parallel--trimlrechopre-{}-post:::' A '

Output:

pre-A-post

Respecting the shell

This tutorial uses Bash as the shell. GNU parallel respects which shell you are using, so in zsh you can do:

parallelecho\={}:::zshbashls 

Output:

/usr/bin/zsh /bin/bash /bin/ls

In csh you can do:

parallel'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"':::* 

Output:

[somedir]isadir 

This also becomes useful if you use GNU parallel in a shell script: GNU parallel will use the same shell as the shell script.

Controlling the output

The output can prefixed with the argument:

parallel--tagechofoo-{}:::ABC 

Output (the order may be different):

Afoo-A Bfoo-B Cfoo-C 

To prefix it with another string use --tagstring:

parallel--tagstring{}-barechofoo-{}:::ABC 

Output (the order may be different):

A-barfoo-A B-barfoo-B C-barfoo-C 

To see what commands will be run without running them use --dryrun:

parallel--dryrunecho{}:::ABC 

Output (the order may be different):

echoA echoB echoC 

To print the command before running them use --verbose:

parallel--verboseecho{}:::ABC 

Output (the order may be different):

echoA echoB A echoC B C 

GNU parallel will postpone the output until the command completes:

parallel-j2'printf "%s-start\n%s" {} {}; sleep {};printf "%s\n" -middle;echo {}-end':::421

Output:

2-start 2-middle 2-end 1-start 1-middle 1-end 4-start 4-middle 4-end 

To get the output immediately use --ungroup:

parallel-j2--ungroup'printf "%s-start\n%s" {} {}; sleep {};printf "%s\n" -middle;echo {}-end':::421

Output:

4-start 42-start 2-middle 2-end 1-start 1-middle 1-end -middle 4-end 

--ungroup is fast, but can cause half a line from one job to be mixed with half a line of another job. That has happened in the second line, where the line '4-middle' is mixed with '2-start'.

To avoid this use --linebuffer:

parallel-j2--linebuffer'printf "%s-start\n%s" {} {}; sleep {};printf "%s\n" -middle;echo {}-end':::421

Output:

4-start 2-start 2-middle 2-end 1-start 1-middle 1-end 4-middle 4-end 

To force the output in the same order as the arguments use --keep-order/-k:

parallel-j2-k'printf "%s-start\n%s" {} {}; sleep {};printf "%s\n" -middle;echo {}-end':::421

Output:

4-start 4-middle 4-end 2-start 2-middle 2-end 1-start 1-middle 1-end 

Saving output into files

GNU parallel can save the output of each job into files:

parallel--filesecho:::ABC 

Output will be similar to this:

/tmp/pAh6uWuQCg.par /tmp/opjhZCzAX4.par /tmp/W0AT_Rph2o.par

By default GNU parallel will cache the output in files in /tmp. This can be changed by setting $TMPDIR or --tmpdir:

parallel--tmpdir/var/tmp--filesecho:::ABC 

Output will be similar to this:

/var/tmp/N_vk7phQRc.par /var/tmp/7zA4Ccf3wZ.par /var/tmp/LIuKgF_2LP.par

Or:

TMPDIR=/var/tmpparallel--filesecho:::ABC 

Output: Same as above.

The output files can be saved in a structured way using --results:

parallel--resultsoutdirecho:::ABC 

Output:

A B C

These files were also generated containing the standard output (stdout), standard error (stderr), and the sequence number (seq):

outdir/1/A/seq outdir/1/A/stderr outdir/1/A/stdout outdir/1/B/seq outdir/1/B/stderr outdir/1/B/stdout outdir/1/C/seq outdir/1/C/stderr outdir/1/C/stdout 

--header : will take the first value as name and use that in the directory structure. This is useful if you are using multiple input sources:

parallel--header:--resultsoutdirecho:::f1AB:::f2CD 

Generated files:

outdir/f1/A/f2/C/seq outdir/f1/A/f2/C/stderr outdir/f1/A/f2/C/stdout outdir/f1/A/f2/D/seq outdir/f1/A/f2/D/stderr outdir/f1/A/f2/D/stdout outdir/f1/B/f2/C/seq outdir/f1/B/f2/C/stderr outdir/f1/B/f2/C/stdout outdir/f1/B/f2/D/seq outdir/f1/B/f2/D/stderr outdir/f1/B/f2/D/stdout 

The directories are named after the variables and their values.

Controlling the execution

Number of simultaneous jobs

The number of concurrent jobs is given with --jobs/-j:

/usr/bin/timeparallel-N0-j64sleep1::::num128 

With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run - depending on how fast your machine is.

By default --jobs is the same as the number of CPU cores. So this:

/usr/bin/timeparallel-N0sleep1::::num128 

should take twice the time of running 2 jobs per CPU core:

/usr/bin/timeparallel-N0--jobs200%sleep1::::num128 

--jobs 0 will run as many jobs in parallel as possible:

/usr/bin/timeparallel-N0--jobs0sleep1::::num128 

which should take 1-7 seconds depending on how fast your machine is.

--jobs can read from a file which is re-read when a job finishes:

echo50%>my_jobs /usr/bin/timeparallel-N0--jobsmy_jobssleep1::::num128& sleep1echo0>my_jobs wait

The first second only 50% of the CPU cores will run a job. Then 0 is put into my_jobs and then the rest of the jobs will be started in parallel.

Instead of basing the percentage on the number of CPU cores GNU parallel can base it on the number of CPUs:

parallel--use-cpus-instead-of-cores-N0sleep1::::num8 

Shuffle job order

If you have many jobs (e.g. by multiple combinations of input sources), it can be handy to shuffle the jobs, so you get different values run. Use --shuf for that:

parallel--shufecho:::123:::abc:::ABC 

Output:

Allcombinationsbutdifferentorderforeachrun. 

Interactivity

GNU parallel can ask the user if a command should be run using --interactive:

parallel--interactiveecho:::123

Output:

echo1?...y echo2?...n 1echo3?...y 3

GNU parallel can be used to put arguments on the command line for an interactive command such as emacs to edit one file at a time:

parallel--ttyemacs:::123

Or give multiple argument in one go to open multiple files:

parallel-X--ttyvi:::123

A terminal for every job

Using --tmux GNU parallel can start a terminal for every job run:

seq1020|parallel--tmux'echo start {}; sleep {}; echo done {}'

This will tell you to run something similar to:

tmux-S/tmp/tmsrPrO0attach 

Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle between windows of the running jobs. When a job is finished it will pause for 10 seconds before closing the window.

Timing

Some jobs do heavy I/O when they start. To avoid a thundering herd GNU parallel can delay starting new jobs. --delayX will make sure there is at least X seconds between each start:

parallel--delay2.5echoStarting{}\;date:::123

Output:

Starting1 ThuAug1516:24:33CEST2013 Starting2 ThuAug1516:24:35CEST2013 Starting3 ThuAug1516:24:38CEST2013

If jobs taking more than a certain amount of time are known to fail, they can be stopped with --timeout. The accuracy of --timeout is 2 seconds:

parallel--timeout4.1sleep{}\;echo{}:::2468

Output:

GNU parallel can compute the median runtime for jobs and kill those that take more than 200% of the median runtime:

parallel--timeout200%sleep{}\;echo{}:::2.12.2372.3 

Output:

2.1 2.2 32.3 

Progress information

Based on the runtime of completed jobs GNU parallel can estimate the total runtime:

parallel--etasleep:::132213321

Output:

Computers/CPUcores/Maxjobstorun 1:local/2/2 Computer:jobsrunning/jobscompleted/%ofstartedjobs/ Averagesecondstocomplete ETA:2s0left1.11avglocal:0/9/100%/1.1s 

GNU parallel can give progress information with --progress:

parallel--progresssleep:::132213321

Output:

Computers/CPUcores/Maxjobstorun 1:local/2/2 Computer:jobsrunning/jobscompleted/%ofstartedjobs/ Averagesecondstocomplete local:0/9/100%/1.1s 

A progress bar can be shown with --bar:

parallel--barsleep:::132213321

And a graphic bar can be shown with --bar and zenity:

seq1000|parallel-j10--bar'(echo -n {};sleep 0.1)'\2>>(perl-pe'BEGIN{$/="\r";$|=1};s/\r/\n/g'|zenity--progress--auto-kill--auto-close)

A logfile of the jobs completed so far can be generated with --joblog:

parallel--joblog/tmp/logexit:::1230 cat/tmp/log 

Output:

SeqHostStarttimeRuntimeSendReceiveExitvalSignalCommand 1:1376577364.9740.0080010exit12:1376577364.9820.0130020exit23:1376577364.9900.0130030exit34:1376577365.0030.0030000exit0

The log contains the job sequence, which host the job was run on, the start time and run time, how much data was transferred, the exit value, the signal that killed the job, and finally the command being run.

With a joblog GNU parallel can be stopped and later pickup where it left off. It it important that the input of the completed jobs is unchanged.

parallel--joblog/tmp/logexit:::1230 cat/tmp/log parallel--resume--joblog/tmp/logexit:::123000 cat/tmp/log 

Output:

SeqHostStarttimeRuntimeSendReceiveExitvalSignalCommand 1:1376580069.5440.0080010exit12:1376580069.5520.0090020exit23:1376580069.5600.0120030exit34:1376580069.5710.0050000exit0 SeqHostStarttimeRuntimeSendReceiveExitvalSignalCommand 1:1376580069.5440.0080010exit12:1376580069.5520.0090020exit23:1376580069.5600.0120030exit34:1376580069.5710.0050000exit05:1376580070.0280.0090000exit06:1376580070.0380.0070000exit0

Note how the start time of the last 2 jobs is clearly different from the second run.

With --resume-failed GNU parallel will re-run the jobs that failed:

parallel--resume-failed--joblog/tmp/logexit:::123000 cat/tmp/log 

Output:

SeqHostStarttimeRuntimeSendReceiveExitvalSignalCommand 1:1376580069.5440.0080010exit12:1376580069.5520.0090020exit23:1376580069.5600.0120030exit34:1376580069.5710.0050000exit05:1376580070.0280.0090000exit06:1376580070.0380.0070000exit01:1376580154.4330.0100010exit12:1376580154.4440.0220020exit23:1376580154.4660.0050030exit3

Note how seq 1 2 3 have been repeated because they had exit value different from 0.

--retry-failed does almost the same as --resume-failed. Where --resume-failed reads the commands from the command line (and ignores the commands in the joblog), --retry-failed ignores the command line and reruns the commands mentioned in the joblog.

parallel--retry-failed--joblog/tmp/log cat/tmp/log 

Output:

SeqHostStarttimeRuntimeSendReceiveExitvalSignalCommand 1:1376580069.5440.0080010exit12:1376580069.5520.0090020exit23:1376580069.5600.0120030exit34:1376580069.5710.0050000exit05:1376580070.0280.0090000exit06:1376580070.0380.0070000exit01:1376580154.4330.0100010exit12:1376580154.4440.0220020exit23:1376580154.4660.0050030exit31:1376580164.6330.0100010exit12:1376580164.6440.0220020exit23:1376580164.6660.0050030exit3

Termination

Unconditional termination

By default GNU parallel will wait for all jobs to finish before exiting.

If you send GNU parallel the TERM signal, GNU parallel will stop spawning new jobs and wait for the remaining jobs to finish. If you send GNU parallel the TERM signal again, GNU parallel will kill all running jobs and exit.

Termination dependent on job status

For certain jobs there is no need to continue if one of the jobs fails and has an exit code different from 0. GNU parallel will stop spawning new jobs with --halt soon,fail=1:

parallel-j2--haltsoon,fail=1echo{}\;exit{}:::00123

Output:

001 parallel:Thisjobfailed: echo1;exit1 parallel:Startingnomorejobs.Waitingfor1jobstofinish. 2

With --halt now,fail=1 the running jobs will be killed immediately:

parallel-j2--haltnow,fail=1echo{}\;exit{}:::00123

Output:

001 parallel:Thisjobfailed: echo1;exit1

If --halt is given a percentage this percentage of the jobs must fail before GNU parallel stops spawning more jobs:

parallel-j2--haltsoon,fail=20%echo{}\;exit{}\:::0123456789

Output:

01 parallel:Thisjobfailed: echo1;exit12 parallel:Thisjobfailed: echo2;exit2 parallel:Startingnomorejobs.Waitingfor1jobstofinish. 3 parallel:Thisjobfailed: echo3;exit3

If you are looking for success instead of failures, you can use success. This will finish as soon as the first job succeeds:

parallel-j2--haltnow,success=1echo{}\;exit{}:::1230456

Output:

1230 parallel:Thisjobsucceeded: echo0;exit0

GNU parallel can retry the command with --retries. This is useful if a command fails for unknown reasons now and then.

parallel-k--retries3\'echo tried {} >>/tmp/runs; echo completed {}; exit {}':::120 cat/tmp/runs 

Output:

completed1 completed2 completed0 tried1 tried2 tried1 tried2 tried1 tried2 tried0

Note how job 1 and 2 were tried 3 times, but 0 was not retried because it had exit code 0.

Termination signals (advanced)

Using --termseq you can control which signals are sent when killing children. Normally children will be killed by sending them SIGTERM, waiting 200 ms, then another SIGTERM, waiting 100 ms, then another SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before giving up. It looks like this:

show_signals(){perl-e'for(keys %SIG) { $SIG{$_} = eval "sub { print \"Got $_\\n\"; }"; } while(1){sleep 1}'}export-fshow_signals echo|parallel--termseqTERM,200,TERM,100,TERM,50,KILL,25\-u--timeout1show_signals 

Output:

GotTERM GotTERM GotTERM 

Or just:

echo|parallel-u--timeout1show_signals 

Output: Same as above.

You can change this to SIGINT, SIGTERM, SIGKILL:

echo|parallel--termseqINT,200,TERM,100,KILL,25\-u--timeout1show_signals 

Output:

GotINT GotTERM

The SIGKILL does not show because it cannot be caught, and thus the child dies.

Limiting the resources

To avoid overloading systems GNU parallel can look at the system load before starting another job:

parallel--load100%echoloadislessthan{}jobpercpu:::1

Output:

[whenthenloadislessthanthenumberofcpucores] loadislessthan1jobpercpu 

GNU parallel can also check if the system is swapping.

parallel--noswapechothesystemisnotswapping:::now 

Output:

[whenthensystemisnotswapping] thesystemisnotswappingnow 

Some jobs need a lot of memory, and should only be started when there is enough memory free. Using --memfree GNU parallel can check if there is enough memory free. Additionally, GNU parallel will kill off the youngest job if the memory free falls below 50% of the size. The killed job will put back on the queue and retried later.

parallel--memfree1Gechowillrunifmorethan1GBis:::free 

GNU parallel can run the jobs with a nice value. This will work both locally and remotely.

parallel--nice17echothisisbeingrunwithnice-n:::17

Output:

thisisbeingrunwithnice-n17

Remote execution

GNU parallel can run jobs on remote servers. It uses ssh to communicate with the remote machines.

Sshlogin

The most basic sshlogin is -Shost:

parallel-S$SERVER1echorunningon:::$SERVER1

Output:

runningon[$SERVER1]

To use a different username prepend the server with username@:

parallel-Susername@$SERVER1echorunningon:::username@$SERVER1

Output:

runningon[username@$SERVER1]

The special sshlogin : is the local machine:

parallel-S:echorunningon:::the_local_machine 

Output:

runningonthe_local_machine

If ssh is not in $PATH it can be prepended to $SERVER1:

parallel-S'/usr/bin/ssh '$SERVER1echocustom:::ssh 

Output:

customssh

The ssh command can also be given using --ssh:

parallel--ssh/usr/bin/ssh-S$SERVER1echocustom:::ssh 

or by setting $PARALLEL_SSH:

exportPARALLEL_SSH=/usr/bin/ssh parallel-S$SERVER1echocustom:::ssh 

Several servers can be given using multiple -S:

parallel-S$SERVER1-S$SERVER2echo:::runningonmorehosts 

Output (the order may be different):

running on more hosts

Or they can be separated by ,:

parallel-S$SERVER1,$SERVER2echo:::runningonmorehosts 

Output: Same as above.

Or newline:

# This gives a \n between $SERVER1 and $SERVER2SERVERS="`echo $SERVER1; echo $SERVER2`" parallel-S"$SERVERS"echo:::runningonmorehosts 

They can also be read from a file (replace user@ with the user on $SERVER2):

echo$SERVER1>nodefile # Force 4 cores, special ssh-command, usernameecho4//usr/bin/sshuser@$SERVER2>>nodefile parallel--sshloginfilenodefileecho:::runningonmorehosts 

Output: Same as above.

Every time a job finished, the --sshloginfile will be re-read, so it is possible to both add and remove hosts while running.

The special --sshloginfile .. reads from ~/.parallel/sshloginfile.

To force GNU parallel to treat a server having a given number of CPU cores prepend the number of core followed by / to the sshlogin:

parallel-S4/$SERVER1echoforce{}cpusonserver:::4

Output:

force4cpusonserver 

Servers can be put into groups by prepending @groupname to the server and the group can then be selected by appending @groupname to the argument if using --hostgroup:

parallel--hostgroup-S@grp1/$SERVER1-S@grp2/$SERVER2echo{}\:::run_on_grp1@grp1run_on_grp2@grp2 

Output:

run_on_grp1 run_on_grp2

A host can be in multiple groups by separating the groups with +, and you can force GNU parallel to limit the groups on which the command can be run with -S@groupname:

parallel-S@grp1-S@grp1+grp2/$SERVER1-S@grp2/SERVER2echo{}\:::run_on_grp1also_grp1 

Output:

run_on_grp1 also_grp1

Transferring files

GNU parallel can transfer the files to be processed to the remote host. It does that using rsync.

echoThisisinput_file>input_file parallel-S$SERVER1--transferfile{}cat:::input_file 

Output:

Thisisinput_file

If the files are processed into another file, the resulting file can be transferred back:

echoThisisinput_file>input_file parallel-S$SERVER1--transferfile{}--return{}.out\cat{}">"{}.out:::input_file catinput_file.out 

Output: Same as above.

To remove the input and output file on the remote server use --cleanup:

echoThisisinput_file>input_file parallel-S$SERVER1--transferfile{}--return{}.out--cleanup\cat{}">"{}.out:::input_file catinput_file.out 

Output: Same as above.

There is a shorthand for --transferfile {} --return --cleanup called --trc:

echoThisisinput_file>input_file parallel-S$SERVER1--trc{}.outcat{}">"{}.out:::input_file catinput_file.out 

Output: Same as above.

Some jobs need a common database for all jobs. GNU parallel can transfer that using --basefile which will transfer the file before the first job:

echocommondata>common_file parallel--basefilecommon_file-S$SERVER1\catcommon_file\;echo{}:::foo 

Output:

commondata foo

To remove it from the remote host after the last job use --cleanup.

Working dir

The default working dir on the remote machines is the login dir. This can be changed with --workdirmydir.

Files transferred using --transferfile and --return will be relative to mydir on remote computers, and the command will be executed in the dir mydir.

The special mydir value ... will create working dirs under ~/.parallel/tmp on the remote computers. If --cleanup is given these dirs will be removed.

The special mydir value . uses the current working dir. If the current working dir is beneath your home dir, the value . is treated as the relative path to your home dir. This means that if your home dir is different on remote computers (e.g. if your login is different) the relative path will still be relative to your home dir.

parallel-S$SERVER1pwd:::"" parallel--workdir.-S$SERVER1pwd:::"" parallel--workdir...-S$SERVER1pwd:::""

Output:

[thelogindiron$SERVER1][currentdirrelativeon$SERVER1][adirin~/.parallel/tmp/...]

Avoid overloading sshd

If many jobs are started on the same server, sshd can be overloaded. GNU parallel can insert a delay between each job run on the same server:

parallel-S$SERVER1--sshdelay0.2echo:::123

Output (the order may be different):

123

sshd will be less overloaded if using --controlmaster, which will multiplex ssh connections:

parallel--controlmaster-S$SERVER1echo:::123

Output: Same as above.

Ignore hosts that are down

In clusters with many hosts a few of them are often down. GNU parallel can ignore those hosts. In this case the host 173.194.32.46 is down:

parallel--filter-hosts-S173.194.32.46,$SERVER1echo:::bar 

Output:

bar

Running the same commands on all hosts

GNU parallel can run the same command on all the hosts:

parallel--onall-S$SERVER1,$SERVER2echo:::foobar 

Output (the order may be different):

foo bar foo bar

Often you will just want to run a single command on all hosts with out arguments. --nonall is a no argument --onall:

parallel--nonall-S$SERVER1,$SERVER2echofoobar 

Output:

foobar foobar

When --tag is used with --nonall and --onall the --tagstring is the host:

parallel--nonall--tag-S$SERVER1,$SERVER2echofoobar 

Output (the order may be different):

$SERVER1foobar $SERVER2foobar 

--jobs sets the number of servers to log in to in parallel.

Transferring environment variables and functions

env_parallel is a shell function that transfers all aliases, functions, variables, and arrays. You active it by running:

source`whichenv_parallel.bash`

Replace bash with the shell you use.

Now you can use env_parallel instead of parallel and still have your environment:

aliasmyecho=echomyvar="Joe's var is" env_parallel-S$SERVER1'myecho $myvar':::green 

Output:

Joe's var is green

The disadvantage is that if your environment is huge env_parallel will fail.

When env_parallel fails, you can still use --env to tell GNU parallel to transfer an environment variable to the remote system.

MYVAR='foo bar'exportMYVAR parallel--envMYVAR-S$SERVER1echo'$MYVAR':::baz 

Output:

foobarbaz

This works for functions, too, if your shell is Bash:

# This only works in Bash my_func(){echoinmy_func$1}export-fmy_func parallel--envmy_func-S$SERVER1my_func:::baz 

Output:

inmy_funcbaz 

GNU parallel can copy all user defined variables and functions to the remote system. It just needs to record which ones to ignore in ~/.parallel/ignored_vars. Do that by running this once:

parallel--record-env cat~/.parallel/ignored_vars

Output:

[listofvariablestoignore-including$PATHand$HOME]

Now all other variables and functions defined will be copied when using --env _.

# The function is only copied if using Bash my_func2(){echoinmy_func2$VAR$1}export-fmy_func2 VAR=foo exportVAR parallel--env_-S$SERVER1'echo $VAR; my_func2':::bar 

Output:

foo inmy_func2foobar 

If you use env_parallel the variables, functions, and aliases do not even need to be exported to be copied:

NOT='not exported var'aliasmyecho=echo not_ex(){myechoinnot_exported_func$NOT$1} env_parallel--env_-S$SERVER1'echo $NOT; not_ex':::bar 

Output:

notexportedvar innot_exported_funcnotexportedvarbar 

Showing what is actually run

--verbose will show the command that would be run on the local machine.

When using --cat, --pipepart, or when a job is run on a remote machine, the command is wrapped with helper scripts. -vv shows all of this.

parallel-vv--pipepart--block1Mwc::::num30000 

Output:

<num30000perl-e'while(@ARGV) { sysseek(STDIN,shift,0) || die;$left = shift; while($read = sysread(STDIN,$buf, ($left > 131072? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'000168894|(wc)3000030000168894

When the command gets more complex, the output is so hard to read, that it is only useful for debugging:

my_func3(){echoinmy_func$1>$1.out }export-fmy_func3 parallel-vv--workdir...--nice17--env_--trc{}.out\-S$SERVER1my_func3{}:::abc-file 

Output will be similar to:

(sshserver--mkdir-p./.parallel/tmp/aspire-1928520-1;rsync --protocol30-rlDzR-essh./abc-file server:./.parallel/tmp/aspire-1928520-1);sshserver--execperl-e \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\' c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI 0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2 ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7 YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;_EXIT_status=$?;mkdir-p./.;rsync--protocol30--rsync-path=cd\ ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync-rlDzR-essh server:./abc-file.out./.;sshserver--\(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh server--\(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\ sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);sshserver--rm-rf .parallel/tmp/aspire-1928520-1;exit$_EXIT_status;

Saving output to shell variables (advanced)

GNU parset will set shell variables to the output of GNU parallel. GNU parset has one important limitation: It cannot be part of a pipe. In particular this means it cannot read anything from standard input (stdin) or pipe output to another program.

To use GNU parset prepend command with destination variables:

parsetmyvar1,myvar2echo:::ab echo$myvar1echo$myvar2

Output:

a b

If you only give a single variable, it will be treated as an array:

parsetmyarrayseq{}5:::123echo"${myarray[1]}"

Output:

2345

The commands to run can be an array:

cmd=("echo '<<joe \"double space\" cartoon>>'""pwd") parsetdata:::"${cmd[@]}"echo"${data[0]}"echo"${data[1]}"

Output:

<<joe"double space"cartoon>> [currentdir]

Saving to an SQL base (advanced)

GNU parallel can save into an SQL base. Point GNU parallel to a table and it will put the joblog there together with the variables and the output each in their own column.

CSV as SQL base

The simplest is to use a CSV file as the storage table:

parallel--sqlandworkercsv:///%2Ftmp/log.csv\seq:::10:::121314 cat/tmp/log.csv 

Note how '/' in the path must be written as %2F.

Output will be similar to:

Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal, Command,V1,V2,Stdout,Stderr 1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"101112", 2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10111213", 3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"1011121314", 

A proper CSV reader (like LibreOffice or R's read.csv) will read this format correctly - even with fields containing newlines as above.

If the output is big you may want to put it into files using --results:

parallel--resultsoutdir--sqlandworkercsv:///%2Ftmp/log2.csv\seq:::10:::121314 cat/tmp/log2.csv 

Output will be similar to:

Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal, Command,V1,V2,Stdout,Stderr 1,:,1458824738.287,0.029,0,9,0,0, "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr 2,:,1458824738.298,0.025,0,12,0,0, "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr 3,:,1458824738.309,0.026,0,15,0,0, "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr 

DBURL as table

The CSV file is an example of a DBURL.

GNU parallel uses a DBURL to address the table. A DBURL has this format:

vendor://[[user][:password]@][host][:port]/[database[/table]

Example:

mysql://scott:tiger@my.example.com/mydatabase/mytable postgresql://scott:tiger@pg.example.com/mydatabase/mytable sqlite3:///%2Ftmp%2Fmydatabase/mytable csv:///%2Ftmp/log.csv 

To refer to /tmp/mydatabase with sqlite or csv you need to encode the / as %2F.

Run a job using sqlite on mytable in /tmp/mydatabase:

DBURL=sqlite3:///%2Ftmp%2Fmydatabase DBURLTABLE=$DBURL/mytable parallel--sqlandworker$DBURLTABLEecho:::foobar:::bazquuz 

To see the result:

sql$DBURL'SELECT * FROM mytable ORDER BY Seq;'

Output will be similar to:

Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|Command|V1|V2|Stdout|Stderr 1|:|1451619638.903|0.806||8|0|0|echofoobaz|foo|baz|foobaz |2|:|1451619639.265|1.54||9|0|0|echofooquuz|foo|quuz|fooquuz |3|:|1451619640.378|1.43||8|0|0|echobarbaz|bar|baz|barbaz |4|:|1451619641.473|0.958||9|0|0|echobarquuz|bar|quuz|barquuz |

The first columns are well known from --joblog. V1 and V2 are data from the input sources. Stdout and Stderr are standard output and standard error, respectively.

Using multiple workers

Using an SQL base as storage costs overhead in the order of 1 second per job.

One of the situations where it makes sense is if you have multiple workers.

You can then have a single master machine that submits jobs to the SQL base (but does not do any of the work):

parallel--sqlmaster$DBURLTABLEecho:::foobar:::bazquuz 

On the worker machines you run exactly the same command except you replace --sqlmaster with --sqlworker.

parallel--sqlworker$DBURLTABLEecho:::foobar:::bazquuz 

To run a master and a worker on the same machine use --sqlandworker as shown earlier.

--pipe

The --pipe functionality puts GNU parallel in a different mode: Instead of treating the data on stdin (standard input) as arguments for a command to run, the data will be sent to stdin (standard input) of the command.

The typical situation is:

command_A|command_B|command_C 

where command_B is slow, and you want to speed up command_B.

Chunk size

By default GNU parallel will start an instance of command_B, read a chunk of 1 MB, and pass that to the instance. Then start another instance, read another chunk, and pass that to the second instance.

catnum1000000|parallel--pipewc 

Output (the order may be different):

1656681656681048571149797149797104857914979614979610485721497971497971048579149797149797104857914979614979610485728534985349597444

The size of the chunk is not exactly 1 MB because GNU parallel only passes full lines - never half a line, thus the blocksize is only 1 MB on average. You can change the block size to 2 MB with --block:

catnum1000000|parallel--pipe--block2Mwc 

Output (the order may be different):

3154653154652097150299593299593209715129959329959320971518534985349597444

GNU parallel treats each line as a record. If the order of records is unimportant (e.g. you need all lines processed, but you do not care which is processed first), then you can use --roundrobin. Without --roundrobin GNU parallel will start a command per block; with --roundrobin only the requested number of jobs will be started (--jobs). The records will then be distributed between the running jobs:

catnum1000000|parallel--pipe-j4--roundrobinwc 

Output will be similar to:

1497971497971048579299593299593209715131546531546520971502351452351451646016

One of the 4 instances got a single record, 2 instances got 2 full records each, and one instance got 1 full and 1 partial record.

Records

GNU parallel sees the input as records. The default record is a single line.

Using -N140000 GNU parallel will read 140000 records at a time:

catnum1000000|parallel--pipe-N140000wc 

Output (the order may be different):

1400001400008688951400001400009800001400001400009800001400001400009800001400001400009800001400001400009800001400001400009800002000020000140001

Note how that the last job could not get the full 140000 lines, but only 20000 lines.

If a record is 75 lines -L can be used:

catnum1000000|parallel--pipe-L75wc 

Output (the order may be different):

16560016560010480951498501498501048950149775149775104842514977514977510484251498501498501048950149775149775104842585350853505974502525176

Note how GNU parallel still reads a block of around 1 MB; but instead of passing full lines to wc it passes full 75 lines at a time. This of course does not hold for the last job (which in this case got 25 lines).

Fixed length records

Fixed length records can be processed by setting --recend '' and --block *recordsize***. A header of size *n* can be processed with **--header .{*n*}.

Here is how to process a file with a 4-byte header and a 3-byte record size:

catfixedlen|parallel--pipe--header.{4}--block3--recend''\'echo start; cat; echo'

Output:

start HHHHAAA start HHHHCCC start HHHHBBB

It may be more efficient to increase --block to a multiplum of the record size.

Record separators

GNU parallel uses separators to determine where two records split.

--recstart gives the string that starts a record; --recend gives the string that ends a record. The default is --recend '\n' (newline).

If both --recend and --recstart are given, then the record will only split if the recend string is immediately followed by the recstart string.

Here the --recend is set to ', ':

echo/foo,bar/,/baz,qux/,|\parallel-kN1--recend', '--pipeechoJOB{#}\;cat\;echo END

Output:

JOB1 /foo,END JOB2 bar/,END JOB3 /baz,END JOB4 qux/, END 

Here the --recstart is set to /:

echo/foo,bar/,/baz,qux/,|\parallel-kN1--recstart/--pipeechoJOB{#}\;cat\;echo END

Output:

JOB1 /foo,barEND JOB2 /,END JOB3 /baz,quxEND JOB4 /, END 

Here both --recend and --recstart are set:

echo/foo,bar/,/baz,qux/,|\parallel-kN1--recend', '--recstart/--pipe\echoJOB{#}\;cat\;echo END

Output:

JOB1 /foo,bar/,END JOB2 /baz,qux/, END 

Note the difference between setting one string and setting both strings.

With --regexp the --recend and --recstart will be treated as a regular expression:

echofoo,bar,_baz,__qux,|\parallel-kN1--regexp--recend,_+--pipe\echoJOB{#}\;cat\;echo END

Output:

JOB1 foo,bar,_END JOB2 baz,__END JOB3 qux, END

GNU parallel can remove the record separators with --remove-rec-sep/--rrs:

echofoo,bar,_baz,__qux,|\parallel-kN1--rrs--regexp--recend,_+--pipe\echoJOB{#}\;cat\;echo END

Output:

JOB1 foo,barEND JOB2 bazEND JOB3 qux, END

--pipepart

--pipe is not very efficient. It maxes out at around 500 MB/s. --pipepart can easily deliver 5 GB/s. But there are a few limitations. The input has to be a normal file (not a pipe) given by -a or :::: and -L/-l/-N do not work. --recend and --recstart, however, do work, and records can often be split on that alone.

parallel--pipepart-anum1000000--block3mwc 

Output (the order may be different):

44444344444430000024285724285723000004126985126984888890

Shebang

Input data and parallel command in the same file

GNU parallel is often called as this:

catinput_file|parallelcommand

With --shebang the input_file and parallel can be combined into the same script.

UNIX shell scripts start with a shebang line like this:

#!/bin/bash

GNU parallel can do that, too. With --shebang the arguments can be listed in the file. The parallel command is the first line of the script:

#!/usr/bin/parallel --shebang -r echo foo bar baz

Output (the order may be different):

foo bar baz

Parallelizing existing scripts

GNU parallel is often called as this:

catinput_file|parallelcommand parallelcommand:::foobar 

If command is a script, parallel can be combined into a single file so this will run the script in parallel:

catinput_file|commandcommandfoobar 

This perl script perl_echo works like echo:

#!/usr/bin/perl print"@ARGV\n"

It can be called as this:

parallelperl_echo:::foobar 

By changing the #!-line it can be run in parallel:

#!/usr/bin/parallel --shebang-wrap /usr/bin/perl print"@ARGV\n"

Thus this will work:

perl_echofoobar

Output (the order may be different):

foo bar

This technique can be used for:

Perl:

#!/usr/bin/parallel --shebang-wrap /usr/bin/perl print"Arguments @ARGV\n";

Python:

#!/usr/bin/parallel --shebang-wrap /usr/bin/python importsys print'Arguments',str(sys.argv)

Bash/sh/zsh/Korn shell:

#!/usr/bin/parallel --shebang-wrap /bin/bashechoArguments"$@"

csh:

#!/usr/bin/parallel --shebang-wrap /bin/cshechoArguments"$argv"

Tcl:

#!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh puts"Arguments $argv"

R:

#!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave args<-commandArgs(trailingOnly=TRUE) print(paste("Arguments ",args))

GNUplot:

#!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot print"Arguments ",system('echo $ARG')

Ruby:

#!/usr/bin/parallel --shebang-wrap /usr/bin/ruby print"Arguments " putsARGV 

Octave:

#!/usr/bin/parallel --shebang-wrap /usr/bin/octaveprintf("Arguments");arg_list=argv();fori=1:nargin printf(" %s",arg_list{i}); endfor printf("\n");

Common LISP:

#!/usr/bin/parallel --shebang-wrap /usr/bin/clisp (format t "~&~S~&" 'Arguments) (format t "~&~S~&" *args*) 

PHP:

#!/usr/bin/parallel --shebang-wrap /usr/bin/php <?php echo"Arguments"; foreach(array_slice($argv,1)as$v){echo" $v";}echo"\n"; ?> 

Node.js:

#!/usr/bin/parallel --shebang-wrap /usr/bin/node varmyArgs=process.argv.slice(2); console.log('Arguments ',myArgs);

LUA:

#!/usr/bin/parallel --shebang-wrap /usr/bin/lua io.write"Arguments"fora=1,#arg doio.write(" ")io.write(arg[a]) end print("")

C#:

#!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp varargv=Environment.GetEnvironmentVariable("ARGV"); print("Arguments "+argv);

Semaphore

GNU parallel can work as a counting semaphore. This is slower and less efficient than its normal mode.

A counting semaphore is like a row of toilets. People needing a toilet can use any toilet, but if there are more people than toilets, they will have to wait for one of the toilets to become available.

An alias for parallel --semaphore is sem.

sem will follow a person to the toilets, wait until a toilet is available, leave the person in the toilet and exit.

sem --fg will follow a person to the toilets, wait until a toilet is available, stay with the person in the toilet and exit when the person exits.

sem --wait will wait for all persons to leave the toilets.

sem does not have a queue discipline, so the next person is chosen randomly.

-j sets the number of toilets.

Mutex

The default is to have only one toilet (this is called a mutex). The program is started in the background and sem exits immediately. Use --wait to wait for all sems to finish:

sem'sleep 1; echo The first finished'&&echoThefirstisnowrunninginthebackground&&sem'sleep 1; echo The second finished'&&echoThesecondisnowrunninginthebackground sem--wait 

Output:

Thefirstisnowrunninginthebackground Thefirstfinished Thesecondisnowrunninginthebackground Thesecondfinished 

The command can be run in the foreground with --fg, which will only exit when the command completes:

sem--fg'sleep 1; echo The first finished'&&echoThefirstfinishedrunningintheforeground&&sem--fg'sleep 1; echo The second finished'&&echoThesecondfinishedrunningintheforeground sem--wait 

The difference between this and just running the command, is that a mutex is set, so if other sems were running in the background only one would run at a time.

To control which semaphore is used, use --semaphorename/--id. Run this in one terminal:

sem--idmy_id-u'echo First started; sleep 10; echo First done'

and simultaneously this in another terminal:

sem--idmy_id-u'echo Second started; sleep 10; echo Second done'

Note how the second will only be started when the first has finished.

Counting semaphore

A mutex is like having a single toilet: When it is in use everyone else will have to wait. A counting semaphore is like having multiple toilets: Several people can use the toilets, but when they all are in use, everyone else will have to wait.

sem can emulate a counting semaphore. Use --jobs to set the number of toilets like this:

sem--jobs3--idmy_id-u'echo Start 1; sleep 5; echo 1 done'&& sem--jobs3--idmy_id-u'echo Start 2; sleep 6; echo 2 done'&& sem--jobs3--idmy_id-u'echo Start 3; sleep 7; echo 3 done'&& sem--jobs3--idmy_id-u'echo Start 4; sleep 8; echo 4 done'&& sem--wait--idmy_id 

Output:

Start1 Start2 Start31done Start42done3done4done

Timeout

With --semaphoretimeout you can force running the command anyway after a period (positive number) or give up (negative number):

sem--idfoo-u'echo Slow started; sleep 5; echo Slow ended'&& sem--idfoo--semaphoretimeout1'echo Forced running after 1 sec'&& sem--idfoo--semaphoretimeout-2'echo Give up after 2 secs' sem--idfoo--wait 

Output:

Slowstarted parallel:Warning:Semaphoretimedout.Stealingthesemaphore. Forcedrunningafter1sec parallel:Warning:Semaphoretimedout.Exiting. Slowended 

Note how the 'Give up' was not run.

Informational

GNU parallel has some options to give short information about the configuration.

--help will print a summary of the most important options:

parallel--help

Output:

Usage: parallel [options] [command [arguments]] < list_of_arguments parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))... cat ... | parallel --pipe [options] [command [arguments]] -j n Run n jobs in parallel -k Keep same order -X Multiple arguments with context replace --colsep regexp Split input on regexp for positional replacements {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings {3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...} -S sshlogin Example: foo@server.example.com --slf .. Use ~/.parallel/sshloginfile as the list of sshlogins --trc {}.bar Shorthand for --transfer --return {}.bar --cleanup --onall Run the given command with argument on all sshlogins --nonall Run the given command with no arguments on all sshlogins --pipe Split stdin (standard input) to multiple jobs. --recend str Record end separator for --pipe. --recstart str Record start separator for --pipe. See 'man parallel' for details Academic tradition requires you to cite works you base your article on. When using programs that use GNU Parallel to process data for publication please cite: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47. This helps funding further development; AND IT WON'T COST YOU A CENT. If you pay 10000 EUR you should feel free to use GNU Parallel without citing. 

When asking for help, always report the full output of this:

parallel--version

Output:

GNUparallel20230122 Copyright(C)2007-2025OleTange,http://ole.tange.dkandFreeSoftware Foundation,Inc. LicenseGPLv3+:GNUGPLversion3orlater<https://gnu.org/licenses/gpl.html> Thisisfreesoftware:youarefreetochangeandredistributeit. GNUparallelcomeswithnowarranty. Website:https://www.gnu.org/software/parallel WhenusingprogramsthatuseGNUParalleltoprocessdataforpublication pleaseciteasdescribedin'parallel --citation'. 

In scripts --minversion can be used to ensure the user has at least this version:

parallel--minversion20130722&&\echoYourversionisatleast20130722. 

Output:

20160322 Yourversionisatleast20130722. 

If you are using GNU parallel for research the BibTeX citation can be generated using --citation:

parallel--citation

Output:

Academictraditionrequiresyoutociteworksyoubaseyourarticleon. WhenusingprogramsthatuseGNUParalleltoprocessdataforpublication pleasecite: @article{Tange2011a, title={GNUParallel-TheCommand-LinePowerTool}, author={O.Tange}, address={Frederiksberg,Denmark}, journal={;login:TheUSENIXMagazine}, month={Feb}, number={1}, volume={36}, url={https://www.gnu.org/s/parallel}, year={2011}, pages={42-47}, doi={10.5281/zenodo.16303}}(Feelfreetouse\nocite{Tange2011a}) Thishelpsfundingfurtherdevelopment;ANDITWON'TCOSTYOUACENT. Ifyoupay10000EURyoushouldfeelfreetouseGNUParallelwithoutciting. Ifyousendacopyofyourpublishedarticletotange@gnu.org,itwillbe mentionedinthereleasenotesofnextversionofGNUParallel. 

With --max-line-length-allowed GNU parallel will report the maximal size of the command line:

parallel--max-line-length-allowed

Output (may vary on different systems):

--number-of-cpus and --number-of-cores run system specific code to determine the number of CPUs and CPU cores on the system. On unsupported platforms they will return 1:

parallel--number-of-cpus parallel--number-of-cores

Output (may vary on different systems):

Profiles

The defaults for GNU parallel can be changed systemwide by putting the command line options in /etc/parallel/config. They can be changed for a user by putting them in ~/.parallel/config.

Profiles work the same way, but have to be referred to with --profile:

echo'--nice 17'>~/.parallel/nicetimeout echo'--timeout 300%'>>~/.parallel/nicetimeout parallel--profilenicetimeoutecho:::ABC 

Output:

A B C

Profiles can be combined:

echo'-vv --dry-run'>~/.parallel/dryverbose parallel--profiledryverbose--profilenicetimeoutecho:::ABC 

Output:

echoA echoB echoC 

Spread the word

I hope you have learned something from this tutorial.

If you like GNU parallel:

(Re-)walk through the tutorial if you have not done so in the past year (https://www.gnu.org/software/parallel/parallel_tutorial.html)
Give a demo at your local user group/your team/your colleagues
Post the intro videos and the tutorial on Reddit, Mastodon, Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook, Linkedin, and mailing lists
Request or write a review for your favourite blog or magazine (especially if you do something cool with GNU parallel)
Invite me for your next conference

If you use GNU parallel for research:

Please cite GNU parallel in you publications (use --citation)

If GNU parallel saves you money:

(Have your company) donate to FSF or become a member https://my.fsf.org/donate/

2013-2025 Ole Tange, GFDLv1.3+ (See LICENSES/GFDL-1.3-or-later.txt)

GNU Parallel Tutorial

Reader's guide

Prerequisites

Input sources

A single input source

Multiple input sources

Linking arguments from input sources

Changing the argument separator.

Changing the argument delimiter

End-of-file value for input source

Skipping empty lines

Building the command line

No command means arguments are commands

Replacement strings

The 7 predefined replacement strings

Changing the replacement strings

Perl expression replacement string

Positional replacement strings

Positional perl expression replacement string

Input from columns

Header defined replacement strings

More pre-defined replacement strings with --plus

Dynamic replacement strings with --plus

More than one argument

Quoting

Trimming space

Respecting the shell

Controlling the output

Saving output into files

Controlling the execution

Number of simultaneous jobs

Shuffle job order

Interactivity

A terminal for every job

Timing

Progress information

Termination

Unconditional termination

Termination dependent on job status

Termination signals (advanced)

Limiting the resources

Remote execution

Sshlogin

Transferring files

Working dir

Avoid overloading sshd

Ignore hosts that are down

Running the same commands on all hosts

Transferring environment variables and functions

Showing what is actually run

Saving output to shell variables (advanced)

Saving to an SQL base (advanced)

CSV as SQL base

DBURL as table

Using multiple workers

--pipe

Chunk size

Records

Fixed length records

Record separators

Header

--pipepart

Shebang

Input data and parallel command in the same file

Parallelizing existing scripts

Semaphore

Mutex

Counting semaphore

Timeout

Informational

Profiles

Spread the word