Using any awk
, sort
, and cut
in any shell on every Unix box and assuming your input is always formatted like the sample you provided in your question where the lines to be sorted always have start/end tags and the other lines don't and <
s don't appear anywhere else in the input:
$ cat tst.sh #!/usr/bin/env bash awk ' BEGIN { FS="<"; OFS="\t" } { idx = ( (NF == 3) && (pNF == 3) ? idx : NR ) print idx, $0 pNF = NF } ' "${@:--}" | sort -k1,1n -k2,2 | cut -f2-
$ ./tst.sh file <Module> <Settings> <Dimensions> <Length>2000</Length> <Volume>13000</Volume> <Width>5000</Width> </Dimensions> <Stats> <Max>3000</Max> <Mean>1.0</Mean> <Median>250</Median> </Stats> </Settings> <Debug> <Errors> <MagicMan>0</MagicMan> <Strike>0</Strike> <Wag>1</Wag> </Errors> </Debug> </Module>
The above uses awk to decorate the input to sort
so that we can just run sort
once on the whole file and then use cut
to remove the number that awk
added. Here are the intermediate steps so you can see what's happening:
awk ' BEGIN { FS="<"; OFS="\t" } { idx = ( (NF == 3) && (pNF == 3) ? idx : NR ) print idx, $0 pNF = NF } ' file 1 <Module> 2 <Settings> 3 <Dimensions> 4 <Volume>13000</Volume> 4 <Width>5000</Width> 4 <Length>2000</Length> 7 </Dimensions> 8 <Stats> 9 <Mean>1.0</Mean> 9 <Max>3000</Max> 9 <Median>250</Median> 12 </Stats> 13 </Settings> 14 <Debug> 15 <Errors> 16 <Strike>0</Strike> 16 <Wag>1</Wag> 16 <MagicMan>0</MagicMan> 19 </Errors> 20 </Debug> 21 </Module>
awk ' BEGIN { FS="<"; OFS="\t" } { idx = ( (NF == 3) && (pNF == 3) ? idx : NR ) print idx, $0 pNF = NF } ' file | sort -k1,1n -k2,2 1 <Module> 2 <Settings> 3 <Dimensions> 4 <Length>2000</Length> 4 <Volume>13000</Volume> 4 <Width>5000</Width> 7 </Dimensions> 8 <Stats> 9 <Max>3000</Max> 9 <Mean>1.0</Mean> 9 <Median>250</Median> 12 </Stats> 13 </Settings> 14 <Debug> 15 <Errors> 16 <MagicMan>0</MagicMan> 16 <Strike>0</Strike> 16 <Wag>1</Wag> 19 </Errors> 20 </Debug> 21 </Module>
Alternatively, using GNU awk
for sorted_in
:
$ cat tst.awk BEGIN { FS="<" } NF == 3 { rows[$0] f = 1 next } f && (NF < 3) { PROCINFO["sorted_in"] = "@ind_str_asc" for (row in rows) { print row } delete rows f = 0 } { print }
If you don't have GNU awk
you can use any awk
and any sort
for that same approach:
$ cat tst.awk BEGIN { FS="<" } NF == 3 { rows[$0] f = 1 next } f && (NF < 3) { cmd = "sort" for (row in rows) { print row | cmd } close(cmd) delete rows f = 0 } { print }
but it'll be much slower then the first 2 solutions above as it's spawning a subshell to call sort
for every block of nested lines.
awk --version
output)?