1

I want to run a command in parallel on a bunch of files as part of a Github CI workflow (on an ubuntu runner) in order to speed up the CI job. I would also like the parallel command to report its progress.

Currently my command looks something like this:

# ci/clang-tidy-parallel.sh find src \ ! -path "path/to/exclude/*" \ -type f \( -name "*.cpp" -o -name "*.h" \) \ | parallel --progress "clang-tidy-19 {}" 

This works great when run from a shell on my own machine: the jobs are executed in parallel and a single line of output is shown with how many jobs are in progress and how many have finished already.

However, when run as part of the Github workflow the output is kind of nasty:

  1. It prints the error sh: 1: cannot open /dev/tty: No such device or address a bunch of times.
  2. It prints way more progress output than necessary. Something like 1700 lines of progress reports, while there are only about 80 jobs to run. Most of these lines are duplicates. E.g., the first couple of lines are:
local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s 

If I run the command locally and redirect stderr to a file, I observe similar behavior

ci/clang-tidy-parallel.sh 2>log 

When the command has finished, the log file contains hundreds of lines of output. (Though no errors about missing /dev/tty.)

On the other hand, without the --progress option, the job just sits there with no visible output until it has completed, which is also not desirable.

Is there a way to configure GNU parallel so that it reports progress in a way that is friendly to non-terminal environments? In particular, I would like it to only print a line of output when the status of a parallel job has changed (which should mean getting one line per job if everything goes smoothly).


Thanks to Ole Tange for pointing me in the right direction. Based on his solution and some AI-assisted coding I came up with this monstrosity:

file_list=$(find src \ ! -path "path/to/exclude/*" \ -type f \( -name "*.cpp" -o -name "*.h" \)) length=$(wc -w <<< "$file_list") echo Running clang-tidy on $length files echo "$file_list" | parallel --bar "clang-tidy-19 {}" \ 2> >( perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' | grep '%' | perl -pe 'BEGIN{$|=1}s/\e\[[0-9;]*[a-zA-Z]//g' | perl -pe "BEGIN{\$length=$length;$|=1} s|(\d+)% (\d+):\d+=\S+ (\S+).*|\$1% (\$2/\$length) -- \$3|" | perl -ne 'BEGIN{$|=1}$s{$_}++ or print') 

The raw output from --bar looks something like this:

# 0 sec src/tuner/Utilities.h 3.65853658536585 [7m3% 3:7[0m9=0s src/tuner/Utilities.h [0m 

(With escape sequences to print the progress bar.)

The successive commands processing that output perform the following transformations:

  • Transform carriage returns into newlines.
  • Find lines containing percentage output.
  • Strip out escape sequences.
  • Perform a regex replacement to extract and format the number of files processed, the completion percentage, and the name of the file being processed. It also includes the total number of files to be processed via a shell variable.
  • Print unique lines.

The BEGIN{$|=1} on the perl invocations is necessary to ensure output gets flushed immediately.

The p option will run perl on each line of input and print the result. The n option runs on each line of input put does not automatically print. The e option provides the script as a CLI argument.

It generates output similar to this:

Running clang-tidy on 82 files 1% (1/82) -- src/tuner/LoadPositions.h 2% (2/82) -- src/tuner/Main.cpp 2% (2/82) -- src/tuner/Utilities.h 3% (3/82) -- src/tuner/Utilities.h ... 

I'm sure there's a better way to do those perl scripts (and not have 4 of them). But this works, and my perl-foo is very weak.

New contributor
JSQuareD is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

    1 Answer 1

    1

    You are looking for --bar:

    seq 1000 | parallel --bar sleep .{} 2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' | grep '#' | perl -ne '$s{$_}++ or print') 

    --bar deceptively looks as if it just generates a bar. But it also generates input for zenity:

    seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' 2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' | zenity --progress --auto-kill) 

    The line starting with '#' is the text shown in zenity. The line with the decimal number is the pct done.

    Or maybe you are looking for --joblog:

    seq 1000 | parallel --joblog my.log sleep 1.{} & sleep 10 cat my.log 
    1
    • Thanks Ole! I adapted your solution to output different information. I don't know any Perl, so my code is very ugly, but it works!
      – JSQuareD
      Commentedyesterday

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.