2

In my bash script, I need to execute two different functions, taskA and taskB, which take an integer ($i) as an argument. Since taskB $i depends on the completion of taskA $i, the following abbreviated piece of code does the job:

#!/bin/bash taskA(){ ... } taskB(){ ... } for i in {1..100}; do taskA $i taskB $i done 

As taskA can be run at different $i independently, I can create a semaphore (taken from here Parallelize a Bash FOR Loop) and execute it in parallel. However, taskB $i requires the completion of taskA $i and the previous taskB $(i-1). Therefore, I just run them sequentially afterwards:

#!/bin/bash open_sem(){ mkfifo pipe-$$ exec 3<>pipe-$$ rm pipe-$$ local i=$1 for((;i>0;i--)); do printf %s 000 >&3 done } run_with_lock(){ local x read -u 3 -n 3 x && ((0==x)) || exit $x ( ( "$@"; ) printf '%.3d' $? >&3 )& } taskA(){ ... } taskB(){ ... } N=36 open_sem $N for i in {1..100}; do run_with_lock taskA $i done wait for i in {1..100}; do taskB $i done 

In order to further optimize the procedure, is it possible to keep the semaphore for the parallel execution of taskA and run taskB simultaneously in such a way that it does not "overtake" taskA and waits for the completion of the taskA it depends on?

2
  • 2
    this seems – involved. First reaction here would be: are you sure it's not time to switch to something less shell, more programming language?CommentedApr 20, 2024 at 20:27
  • ( taskA $i; taskB $i ) & ?CommentedApr 21, 2024 at 12:46

2 Answers 2

1

Since taskB must run sequentially, there isn't much room for optimizing that part of it. All you can do is parallelize taskA as much as possible. Could you just make the last step of taskA something like touch /dev/shm/taskA-$i and the first step of taskB something like until test -e /dev/shm/taskA-$i; do sleep 1; done

Then run taskA's in the background with gnu parallel, and taskB's in your for loop. As A's finish (in whichever order they finish) the B's will never overtake them and will always run sequentially.

    0

    However, taskB $i requires the completion of taskA $i and the previous taskB $(i-1).

    AFACT, you do this by running a single taskA and a single taskB in parallel; namely: taskA $i and taskB $i-1

    It can be done like this:

    taskA() { echo dummy A=$1; } taskB() { echo dummy B=$1; } export -f taskA taskB # Do the inital warm up so we can run taskA $i, taskB $i-1 taskA 1 taskA 2 taskB 1 # Now taskA $i and taskB $i-1 is done for $i=2 doit() { i=$1 a=$i b=$((i - 1)) # Run 'taskA $1' and 'taskB $i-1' in parallel parallel ::: "taskA $a" "taskB $b" } export -f doit seq 3 1000 | parallel -j1 doit 

    GNU Parallel will be started once for every $i giving around 0.2 sec overhead. If your jobs are short (< 1 sec), this overhead might be too big.

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.