Format column-based text file into tree structure using bash

Name: shell script - Format column-based text file into tree structure using bash - Unix & Linux Stack Exchange
Rating: 4.7 (4350 reviews)

Question

Is there a Unix/Linux command that can turn this:

AMERICA USA NEW_YORK AB-100 AMERICA USA NEW_YORK VF-200 AMERICA USA NEW_YORK XY-243 AMERICA USA LOS_ANGELES UH-198 AMERICA CANADA TORONTO UT-876 AMERICA CANADA TORONTO UT-877 AMERICA CANADA VANCOUVER UT-871 AMERICA CANADA VANCOUVER UT-872 AMERICA CANADA VANCOUVER UT-873 AMERICA MEXICO MEXICO OU-098 AMERICA MEXICO MONTERREY OU-099 AMERICA MEXICO MONTERREY OU-100 EUROPE FRANCE PARIS IV-122 EUROPE FRANCE PARIS AV-112 EUROPE FRANCE PARIS IF-111 EUROPE FRANCE PARIS XX-190 EUROPE FRANCE TOULOUSE TL-654

Into this:

AMERICA USA NEW_YORK AB-100 VF-200 XY-243 LOS_ANGELES UH-198 CANADA TORONTO UT-876 UT-877 VANCOUVER UT-871 UT-872 UT-873 MEXICO MEXICO OU-098 MONTERREY OU-099 OU-100 EUROPE FRANCE PARIS IV-122 AV-112 IF-111 XX-190 TOULOUSE TL-654

Why using bash? Can we use different tools? Bash can do this but it will be needlessly complex. — terdon, CommentedJun 11, 2015 at 16:01
Given the spacing in your data I suppose the input fields are TAB separated? — Janis, CommentedJun 12, 2015 at 3:59

Cyrus · Accepted Answer · 2015-06-12 05:50:57Z

For your example:

dir=$(mktemp -d) sed 's|\t|/|g' file | while read -r line; do mkdir -p "$dir/$line"; done (cd "$dir"; tree) rm -r "$dir"

Output:

 . ├── AMERICA │   ├── CANADA │   │   ├── TORONTO │   │   │   ├── UT-876 │   │   │   └── UT-877 │   │   └── VANCOUVER │   │   ├── UT-871 │   │   ├── UT-872 │   │   └── UT-873 │   ├── MEXICO │   │   ├── MEXICO │   │   │   └── OU-098 │   │   └── MONTERREY │   │   ├── OU-099 │   │   └── OU-100 │   └── USA │   ├── LOS_ANGELES │   │   └── UH-198 │   └── NEW_YORK │   ├── AB-100 │   ├── VF-200 │   └── XY-243 └── EUROPE └── FRANCE ├── PARIS │   ├── AV-112 │   ├── IF-111 │   ├── IV-122 │   └── XX-190 └── TOULOUSE └── TL-654

Now that's sneaky, well done! I suggest using dir=$(mktemp -d);sed 's| \+|/|g' file | while read -r line; do mkdir -p "$dir/$line"; done; tree "$dir" | sed 's/[│└─├]\+/ /g'; rm -r "$dir" to i) avoid the needless cds and ii) remove the indentation characters. — terdon, CommentedJun 11, 2015 at 18:38
@Cyrus; Interesting idea! -Though I'd not create a lot of directories where all we need is just text-processing. - Note also that the single cd in your code is wrong; it works only if you happen to have been in your HOME directory when starting the code sequence. Either use cd - or perform cd "$dir"; tree in a subshell (cd "$dir"; tree) and remove the final cd. — Janis, CommentedJun 12, 2015 at 4:11
@Janis: I've updated my answer with your hint to use a subshell. — Cyrus, CommentedJun 12, 2015 at 5:52

Janis · Accepted Answer · 2015-06-12 09:20:36Z

An awk script that works with standard awks, keeps the lines in its original order, and works with an arbitrary number of columns in the data:

awk -F $'\t' ' function indent (n, i) { for (i=1; i<=n; i++) printf "\t" } { for (i=1; i<=NF; i++) if ($i != o[i]) { printf "%s%s\n", indent(i-1), $i o[i] = $i } } '

The same program logic can be implemented in shell. For bash:

indent () { ind=$( printf "%*s" "$1" '' ) ; printf "${ind// /$'\t'}" ;} while IFS=$'\t' read -a f do for ((i=0; i<${#f}; i++)) do if [[ "${f[i]}" != "${o[i]}" ]] then printf "%s%s\n" "$( indent "$i" )" "${f[i]}" o[i]=${f[i]} fi done done

Note: For ksh you have to adjust read -a by read -A.

terdon · Accepted Answer · 2015-06-11 21:57:30Z

In awk, you could do:

$ awk '{ a[$1][$2][$3] ? a[$1][$2][$3]=a[$1][$2][$3]"\n\t\t\t"$4 : a[$1][$2][$3]="\t\t\t"$4 ; } END{ for(cont in a){ printf "%s\n", cont; for(count in a[cont]){ printf "\t%s\n", count; for(city in a[cont][count]){ print "\t\t"city"\n"a[cont][count][city] }}}}' file EUROPE FRANCE TOULOUSE TL-654 PARIS IV-122 AV-112 IF-111 XX-190 AMERICA USA NEW_YORK AB-100 VF-200 XY-243 LOS_ANGELES UH-198 CANADA VANCOUVER UT-871 UT-872 UT-873 TORONTO UT-876 UT-877 MEXICO MEXICO OU-098 MONTERREY OU-099 OU-100

In Perl:

perl -lane 'push @{$k{$F[0]}{$F[1]}{$F[2]}},"\t\t\t".$F[3]; END{ for $cont (keys(%k)){ print "$cont"; for $coun (keys(%{$k{$cont}})){ print "\t$coun"; for $city (keys(%{$k{$cont}{$coun}})){ print "\t\t$city\n", join "\n",@{$k{$cont}{$coun}{$city}} }}}}' file EUROPE FRANCE PARIS XX-190 XX-190 TOULOUSE TL-654 AMERICA USA NEW_YORK XY-243 XY-243 LOS_ANGELES UH-198 MEXICO MONTERREY OU-100 OU-100 MEXICO OU-098 CANADA VANCOUVER UT-873 UT-873 TORONTO UT-877 UT-877

@user1598390 that's odd. Did you copy/paste directly into your shell or did you copy it manually? — terdon, CommentedJun 11, 2015 at 16:32
@user1598390 see updated answer. There were some bugs in the awk code (though they wouldn't have given that error) and I also added Perl solution. — terdon, CommentedJun 11, 2015 at 17:11
The perl version runs but prints 2 items at most at the end of a branch, for example new york has three items but it only prints two and those two are the last one repeated. — Tulains Córdova, CommentedJun 11, 2015 at 18:36
@user1598390 sorry again, another bug. The new version should work. I'm still curious about the awk though. I just tested again and it works fine. — terdon, CommentedJun 11, 2015 at 21:58

Stack Exchange Network

Format column-based text file into tree structure using bash

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Format column-based text file into tree structure using bash

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions