Bash: Loop though multiple files and print each line from each file and write to a different file

Question

I have 3 files with different number of lines. I am trying to loop through the three files and print 1st line from each file into a new file output1.txt and then 2nd line from each file into another new file output2.txt and so on. Because number of lines are different in each file, If there is no entry for file2 and file3 on few lines, it should ignore and print nothing in the subsequent output files created. How can i achieve this in bash?

file1

xyz abc def ghi jkl

file2

123 456 789

file3

ax1 by2

OUTPUT FILES

output1.txt

xyz 123 ax1

output2.txt

abc 456 by2

output3.txt

def 789

output4.txt

ghi

output5.txt

jkl

What have you tried?
– Nasir Riley
CommentedSep 3, 2021 at 0:47 — Nasir Riley, CommentedSep 3, 2021 at 0:47

cas · Accepted Answer · 2021-09-03 11:43:43Z

Use bash to tell awk to do it, that's how bash is meant to be used (and not, for example, to do text processing itself).

e.g. the following awk one-liner writes each input line to a filename constructed from the literal string "output", the current line number of the current input file (awk variable FNR) and the literal string ".txt":

$ awk '{print > "output" FNR ".txt"}' file* $ tail output* ==> output1.txt <== xyz 123 ax1 ==> output2.txt <== abc 456 by2 ==> output3.txt <== def 789 ==> output4.txt <== ghi ==> output5.txt <== jkl

Note: If you have lots of output files (hundreds or more), you may run into problems. With some versions of awk, if you exceed the number of file handles allowed to your process by the kernel and login environment, it may just die with an error. With other versions of awk (e.g. GNU awk), it may just slow down while it manages which file handles are open for write at any given moment. Unless some of your input files are hundreds of lines long, it's not likely to be a problem.

The following will work with any version of awk with input files of any length (because it only ever has one output file open for write at a time) but it will be significantly slower because it opens the output file for each write and closes it immediately after the write. Even so, it will still be many times faster than doing this in shell.

awk '{ # use 5-digit zero-padded output filenames for this version # e.g. output00001.txt out = sprintf("output%05i.txt", FNR); if (out in files) { # we have written to this file before, so append to it print >> out } else { # first write to this file, so create or truncate it. print > out files[out]=1 } close(out) }' file*

file{1..3}.txt might do it better though, but not sure which version it is present might require bash4+ — Jetchisel, CommentedSep 3, 2021 at 10:06
I originally had file[123] as the file glob arg to awk, but changed it to file*. — cas, CommentedSep 3, 2021 at 11:22
@bharathjavvadhi If this solves your issue, please consider accepting the answer. Accepting an answer marks the issue as resolved. — Kusalananda, CommentedSep 6, 2021 at 12:13

Mingye Wang · Accepted Answer · 2021-09-06 12:08:08Z

Below solution my colleague suggested has worked for me. Thank you all for taking time to answer my question. I really appreciate it.

# for flattening and merging the file and writing to a tempfile pr -J -m -t file* --sep-string=';' > mergedfile # Now you have data in each line which can be looped and redirected to respective filenames based on the loop count # No.of files will be equal to the biggest file in the input, in this case , you will see 240 files i=1 while read line; do echo "=======file no: $i ========" echo $line|sed -e 's@^;@@g' -e 's@;;@@g'|tr ';' '\n' > output${i}.txt let i=i+1 done < mergedfile

This does not preserve whitespace in the data, and it may possibly expand certain escape-sequences. It also would produce unexpected results if any line contained filename globbing patterns. — Kusalananda, CommentedSep 6, 2021 at 12:16
leaving aside the quirks of plain read (without -r and clearing IFS), echo and the unquoted variable expansion, this also seems awfully awkward, and would produce invalid results if the input data contains semicolons. Was the something wrong with the awk solution posted earlier? — ilkkachu, CommentedSep 6, 2021 at 12:32

Stack Exchange Network

Bash: Loop though multiple files and print each line from each file and write to a different file

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Bash: Loop though multiple files and print each line from each file and write to a different file

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions