I want to run a command in parallel on a bunch of files as part of a Github CI workflow (on an ubuntu runner) in order to speed up the CI job. I would also like the parallel command to report its progress.
Currently my command looks something like this:
# ci/clang-tidy-parallel.sh find src \ ! -path "path/to/exclude/*" \ -type f \( -name "*.cpp" -o -name "*.h" \) \ | parallel --progress "clang-tidy-19 {}"
This works great when run from a shell on my own machine: the jobs are executed in parallel and a single line of output is shown with how many jobs are in progress and how many have finished already.
However, when run as part of the Github workflow the output is kind of nasty:
- It prints the error
sh: 1: cannot open /dev/tty: No such device or address
a bunch of times. - It prints way more progress output than necessary. Something like 1700 lines of progress reports, while there are only about 80 jobs to run. Most of these lines are duplicates. E.g., the first couple of lines are:
local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s local:4/0/100%/0.0s
If I run the command locally and redirect stderr to a file, I observe similar behavior
ci/clang-tidy-parallel.sh 2>log
When the command has finished, the log file contains hundreds of lines of output. (Though no errors about missing /dev/tty.)
On the other hand, without the --progress
option, the job just sits there with no visible output until it has completed, which is also not desirable.
Is there a way to configure GNU parallel so that it reports progress in a way that is friendly to non-terminal environments? In particular, I would like it to only print a line of output when the status of a parallel job has changed (which should mean getting one line per job if everything goes smoothly).
Thanks to Ole Tange for pointing me in the right direction. Based on his solution and some AI-assisted coding I came up with this monstrosity:
file_list=$(find src \ ! -path "path/to/exclude/*" \ -type f \( -name "*.cpp" -o -name "*.h" \)) length=$(wc -w <<< "$file_list") echo Running clang-tidy on $length files echo "$file_list" | parallel --bar "clang-tidy-19 {}" \ 2> >( perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' | grep '%' | perl -pe 'BEGIN{$|=1}s/\e\[[0-9;]*[a-zA-Z]//g' | perl -pe "BEGIN{\$length=$length;$|=1} s|(\d+)% (\d+):\d+=\S+ (\S+).*|\$1% (\$2/\$length) -- \$3|" | perl -ne 'BEGIN{$|=1}$s{$_}++ or print')
The raw output from --bar
looks something like this:
# 0 sec src/tuner/Utilities.h 3.65853658536585 [7m3% 3:7[0m9=0s src/tuner/Utilities.h [0m
(With escape sequences to print the progress bar.)
The successive commands processing that output perform the following transformations:
- Transform carriage returns into newlines.
- Find lines containing percentage output.
- Strip out escape sequences.
- Perform a regex replacement to extract and format the number of files processed, the completion percentage, and the name of the file being processed. It also includes the total number of files to be processed via a shell variable.
- Print unique lines.
The BEGIN{$|=1}
on the perl invocations is necessary to ensure output gets flushed immediately.
The p
option will run perl on each line of input and print the result. The n
option runs on each line of input put does not automatically print. The e
option provides the script as a CLI argument.
It generates output similar to this:
Running clang-tidy on 82 files 1% (1/82) -- src/tuner/LoadPositions.h 2% (2/82) -- src/tuner/Main.cpp 2% (2/82) -- src/tuner/Utilities.h 3% (3/82) -- src/tuner/Utilities.h ...
I'm sure there's a better way to do those perl scripts (and not have 4 of them). But this works, and my perl-foo is very weak.