Here's an example of how to do what you described using perl and a Hash-of-Array-of-Arrays (HoAoA) data structure.
To help in understanding this, the following man pages will be useful: perldata
(perl data types), perldsc
(data structures), perllol
(lol = lists of lists), perlref
(references) and perlreftut
(tutorial for references). You can also get details on specific perl functions with the perldoc
command - e.g. perldoc -f opendir
or perldoc -f grep
.
Note that the sort
and grep
used in the script are built-in perl functions. They are not the sort
and grep
command-line tools...you can call those from perl if you want to (with backticks or qx
quoting, or the system()
function, or with the open()
function to open a pipe, and several other ways). Use perldoc
for details on all of these and more.
$ cat HoAoA.pl #!/usr/bin/perl use strict; use Data::Dump qw(dd); # $h is a ref to Hash-of-Array-ofArrays (HoAoA). # # This will be a data structure with the directory names # (Folder1, Folder2, Folder3) as the hash keys of the top-level # hash. Each element of that hash will be an array where the # indexes are the line numbers of the data.txt files in each # of those directories. The data in these second-level arrays # will be an array containing the three values in each line of # data.txt: $$h{directory}[line number][element] my $h; # get the directory name from the first command line arg, default to ./ my $dir = shift // './'; # get a list of subdirectories that contain 'data.txt', # excluding . and .. opendir(my $dh, "$dir") || die "Couldn't open directory $dir: $!\n"; my @dirs = sort grep { $_ !~ /^\.+$/ && -d $_ && -f "$_/data.txt" } readdir($dh); closedir($dh); dd \@dirs; # Data::Dump's dd function is great for showing what's in an array print "\n"; foreach my $d (@dirs) { my $f = "$d/data.txt"; open(my $fh,"<",$f) || die "Couldn't open file $f: $!\n"; my $lc=0; # line counter while(<$fh>) { chomp; # strip trailing newline char at end-of-line my @row = split /\s*,\s*/; # assume simple comma-delimited values push @{ $$h{$d}[$lc++] }, @row; } close($fh); } # dd is even better for showing complex structured data dd $h; print "\n"; # show how to access individual elements, e.g. by changing the # zeroth element of line 0 of 'Folder1' to 999. $$h{'Folder1'}[0][0] = 999; dd $h; print "\n"; # show how to print the data without using Data::Dump # a loop like this can also be used to process the data. # You could also process the data in the main loop above # as the data is being read in. foreach my $d (sort keys %{ $h }) { # `foreach my $d (@dirs)` would work too print "$d/data.txt:\n"; foreach my $lc (keys @{ $$h{$d} }) { print " line $lc: ", join("\t",@{ $$h{$d}[$lc] }), "\n"; } print "\n"; }
Note: the above is written to process simple comma-delimited data files. For actual CSV, with all its quirks and complications (like multi-line double-quoted fields with embedded commas), use the Text::CSV module. This is a third-party library module that isn't included with the core perl distribution. On Debian and related distros you can install this with apt-get install libtext-csv-perl libtext-csv-xs-perl
. Other distros probably have similar package names. Or you can install it with cpan
(a tool to install and manage library modules that IS included with perl core).
Also note: the above script uses the Data::Dump. This is a third-party module which is useful for dumping structured data. Unfortunately, it's not included as part of the perl core library. On Debian etc apt-get install libdata-dump-perl
. Other distros will have a similar package name. And, as a last resort, you can install it with cpan
.
Anyway, with the following folder structure and data.txt files:
$ tail */data.txt ==> Folder1/data.txt <== 1,2,3 4,5,6 7,8,9 ==> Folder2/data.txt <== 7,8,9 4,5,6 1,2,3 ==> Folder3/data.txt <== 9,8,7 6,5,4 3,2,1
running the HoHoA.pl script produces the following output:
$ ./HoAoA.pl ["Folder1", "Folder2", "Folder3"] { Folder1 => [[1, 2, 3], [4, 5, 6], [7, 8, 9]], Folder2 => [[7, 8, 9], [4, 5, 6], [1, 2, 3]], Folder3 => [[9, 8, 7], [6, 5, 4], [3, 2, 1]], } { Folder1 => [[999, 2, 3], [4, 5, 6], [7, 8, 9]], Folder2 => [[7, 8, 9], [4, 5, 6], [1, 2, 3]], Folder3 => [[9, 8, 7], [6, 5, 4], [3, 2, 1]], } Folder1/data.txt: line 0: 999 2 3 line 1: 4 5 6 line 2: 7 8 9 Folder2/data.txt: line 0: 7 8 9 line 1: 4 5 6 line 2: 1 2 3 Folder3/data.txt: line 0: 9 8 7 line 1: 6 5 4 line 2: 3 2 1
earrN
must be arrays themselves, and if we look at a set of variables namedfoo1
,foo2
, etc. that does look a lot like an array in itself. Though you're right, it's not a 2D-array, but an array of arrays, since it doesn't need to be rectangular.earr$j
) any more, so why even do that? IMO you probably need to rethink your data structure from the ground up. And, as I said above, use a language that's actually suited to processing data instead of a language that's suited to co-ordinating the execution of other programs to do the data-processing work.earr$j
as one token (variable name), it will split the token at the$
sign, not construct a token from the fixed stringearr
and the value of$j
. Tryeval "earr$j+=(1)"
instead. Note thateval
is potentially dangerous and should be used with caution - it tells your shell to evaluate the string and execute it.eval
has its uses, but most of those uses are awkwardly trying to work around a deficiency in shell. have i suggested using a better language yet? :-)