Not precisely what you asked for, but may be adapted.
This processes all files with suffix .txt
in the current directory. For each file (e.g. Cairo.txt
):
- It uses
tr
to replace all white-space by new-line, getting a plain one-per-line word list. - It uses
fmt
to pack a whole number of words into lines, up to a specified length. - It uses
split
to make those lines into a series of files named Cairo.seq.0000
and up.
For testability, I used width 60 and lines 30, and my input was three plain-text man pages generated with this:
for cmd in tr fmt split; do man $cmd | col -b > $cmd.txt; done
This is the script:
#! /bin/bash for fn in ./*.txt; do Base="${fn%.txt}" tr -s '[:space:]' '\n' < "${fn}" | fmt -60 | split -a 4 -d -l 30 - "./${Base}.seq." done
The line width is the "60" in the fmt command. So you might want to make this 100.
The number of lines per output file is the "30" in the split command. You seemed to want this to be 1 line per file. However, you are going to get a lot of small files like that. A 100-byte file still takes a 4096-byte block.
You can see that the number of words is unchanged, but the whitespace is reduced, and the lines are fewer.
paul $ wc * 29 214 1561 fmt.seq.0000 61 214 1832 fmt.txt 30 260 1665 split.seq.0000 15 101 780 split.seq.0001 94 361 2892 split.txt 30 263 1724 tr.seq.0000 18 126 929 tr.seq.0001 124 389 3282 tr.txt 410 1955 14821 total paul $