Parallel for-loop in bash with simultaneous sequential execution of another task with dependencies on the parallelized loop

Question

In my bash script, I need to execute two different functions, taskA and taskB, which take an integer ($i) as an argument. Since taskB $i depends on the completion of taskA $i, the following abbreviated piece of code does the job:

#!/bin/bash taskA(){ ... } taskB(){ ... } for i in {1..100}; do taskA $i taskB $i done

As taskA can be run at different $i independently, I can create a semaphore (taken from here Parallelize a Bash FOR Loop) and execute it in parallel. However, taskB $i requires the completion of taskA $i and the previous taskB $(i-1). Therefore, I just run them sequentially afterwards:

#!/bin/bash open_sem(){ mkfifo pipe-$$ exec 3<>pipe-$$ rm pipe-$$ local i=$1 for((;i>0;i--)); do printf %s 000 >&3 done } run_with_lock(){ local x read -u 3 -n 3 x && ((0==x)) || exit $x ( ( "$@"; ) printf '%.3d' $? >&3 )& } taskA(){ ... } taskB(){ ... } N=36 open_sem $N for i in {1..100}; do run_with_lock taskA $i done wait for i in {1..100}; do taskB $i done

In order to further optimize the procedure, is it possible to keep the semaphore for the parallel execution of taskA and run taskB simultaneously in such a way that it does not "overtake" taskA and waits for the completion of the taskA it depends on?

this seems – involved. First reaction here would be: are you sure it's not time to switch to something less shell, more programming language? — Marcus Müller, CommentedApr 20, 2024 at 20:27

BrianR · Accepted Answer · 2025-03-03 22:36:28Z

Since taskB must run sequentially, there isn't much room for optimizing that part of it. All you can do is parallelize taskA as much as possible. Could you just make the last step of taskA something like touch /dev/shm/taskA-$i and the first step of taskB something like until test -e /dev/shm/taskA-$i; do sleep 1; done

Then run taskA's in the background with gnu parallel, and taskB's in your for loop. As A's finish (in whichever order they finish) the B's will never overtake them and will always run sequentially.

Ole Tange · Accepted Answer · 2024-05-27 23:16:36Z

However, taskB $i requires the completion of taskA $i and the previous taskB $(i-1).

AFACT, you do this by running a single taskA and a single taskB in parallel; namely: taskA $i and taskB $i-1

It can be done like this:

taskA() { echo dummy A=$1; } taskB() { echo dummy B=$1; } export -f taskA taskB # Do the inital warm up so we can run taskA $i, taskB $i-1 taskA 1 taskA 2 taskB 1 # Now taskA $i and taskB $i-1 is done for $i=2 doit() { i=$1 a=$i b=$((i - 1)) # Run 'taskA $1' and 'taskB $i-1' in parallel parallel ::: "taskA $a" "taskB $b" } export -f doit seq 3 1000 | parallel -j1 doit

GNU Parallel will be started once for every $i giving around 0.2 sec overhead. If your jobs are short (< 1 sec), this overhead might be too big.

Stack Exchange Network

Parallel for-loop in bash with simultaneous sequential execution of another task with dependencies on the parallelized loop

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Parallel for-loop in bash with simultaneous sequential execution of another task with dependencies on the parallelized loop

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions